That's really the best way.
Sometimes the source material isn't that easily trackable (for example, where do the parts of the face go when the hand moves in? Easy for us humans, hard for the computer). Of course this causes issues for the temporal part of DE:Noise to be able to do as good of a job as when things are more easily tracked.
And in those areas where the hand moves in front of the face, you might try a bit more spatial filtering if you feel too much noise is left after reducing the temporal settings.
Pete
http://www.revisionfx.com