There are a few things happening there that make that effective. On the beat, video slows down. That could be created using using a time effect(timewarp) or just time remapping the layer.
Next the video is duplicated twice and masked and stacked above the original layer. Position keyframes move the 2 upper layers into place. The opacity of these layers isn't 100%, and with a couple of keyframes they fade quickly out, revealing back the original image.
"I have not failed 700 times. I have succeeded in proving that those 700 ways will not work. When I have eliminated the ways that will not work, I will find the way that will work."
---THOMAS EDISON on inventing the light bulb.