If you're talking about the white text that is superimposed on the scene, indicating what is being said, it can be done with text events (Media Generators tab, Text event) that are controlled with keyframes in Track Motion - 3D Source Alpha. There will be a lot of keyframing needed to give the illusion that the text is part of the scene. It can all be done from within Vegas.
I tried a lot of times but with keyframing seems a lot impossible to make a realistic "Inside the scene" text. Probably it is just me, but isn't there some other way? I'm trying since some hours and the text isn't even nearly stationary as the one in the video. Thank you!
PS. even tried something with motion tracking and after effects but can't figure out..