Hi again,
After searching some more I found
an even better explanation for this. Maybe a better description of this is to layer (rather then merge) video and audio into a single entity (not unlike a photoshop image), which then can be used in the editing (there are also several threads that come up in the FCP forum here on Creative Cow if you search for "Merge Clips".
Concerning Vegas, the closest I've managed to find is in
this thread. The workaround suggested there is either to sync on the timeline and either render out to a file (but then losing the possibility of having layers of audio or video) or to save it as a .veg file, which seems to pretty much do the trick.
I'll look into PluralEyes. Thanks, Steve, for pointing it out (so many useful software out there, it's impossible to keep track), according to
their FAQ it should be possible.