Be careful when asking that question "2 audio tracks" usually means a dub and the original (or directors commentary or something) which is container dependent (AVI has a crude hack but MKV is the chosen method here).
I am slightly confused as to what you want and what you have, if I understand you
your have a video file with a single audio track.
This track is a stereo track but it is odd in that the left channel is just music (no vocals) while the right is the music and some vocals. That must sound kind of odd but that is besides the point.
However you have a 3 channel speaker setup (left, right and middle).
While you could encode you might be able to get away with doing it in real time at the mixer level.
Most codec packs and playback uses FFDshow (CCCP is a good pack if you want
http://www.cccp-project.net/ )
http://mewiki.project357.com/wiki/FFDShow_reference#Mixer
Failing that most players will have something like it.
Simple stick a non zero number in the box you want (the number refers to the ultimate volume with 1 being full volume).
You can however have multiple "speakers" playing through one speaker (it is how playing a 5.1 track through stereo speakers works) by simply sticking a 1 in the relevant box.
You now have a problem, removing vocals to leave a "pleasant" track behind is notoriously difficult. You appear to have lucked out though and you have the "master" track.
So without going into the proper audio engineering territory try placing a "-1" in the box. This will essentially invert the song and courtesy of physics it will cancel things out.
Do this with the music + vocals track and by essentially doing a -music you are left with the vocals. This is a drastic oversimplification of course (for a "pure" audio track without the issues of encoding, decoding, post processing, the invert and less than flawless recording equipment) and you might get better results using a number less than 1 and otherwise reducing the volume to the track.
If that does not work human vocals are usually fairly well contained frequency wise and if you are lucky your instruments will fall outside this, you then have various filters to try and kill frequency ranges:
http://www.dspguru.com/info/faqs/index.htm Note this is how such things are usually done if you are not as lucky as you are here.
If you do want to encode then you will essentially be doing the same thing, do yourself a favour though and encode it using something like audacity (you can dump audio to wave using virtual dub) rather than fiddling with the audio options that most video programs give*.
*if you are using one of the uber expensive programs or something like avisynth you could probably do it but I would still avoid it when something simple like this exists.
Good news about doing it outside of a video environment is that you can simply mux it back in and save yourself a video encode.