Specifying the video codec as copy, (i.e. "-vcodec copy") will copy the video stream without re-encoding. If you want to replace the audio, you use more than one input, and use stream mapping to specify which stream sources end up in the output.
How do we understand streams? How can a video file have more than one audio or video stream? When we use 'map' command, we indicate a stream, 0:0 means "first stream of a video"? I always though a video file has one stream of audio which can consist of two channels (stereo audio).
A video file can an arbitrary number of streams. They are simply separate data streams chopped up into chunks and interleaved together. They do no have to be just audio or video either. For example a stream can hold time code, closed captioning, subtitle or metadata.
A good example you might be familiar with is DVDs. They often have more than one audio stream to support multiple languages. When you switch from english to spanish, it's just changing with audio stream it is reading.
Likewise, audio streams can contain any number of channels (e.g. surround sound) FFmpeg can do channel mapping also.
You can have multiple video streams that can be in different codecs, different resolutions, and even different frame rates. You are only limited by whatever the limitations are of the format they are muxed into.
The "-map 0:0" means "first file, first stream". First file being whatever file is listed first in the command line via the "-i" option. First stream is whatever stream is first inside the file, regardless of the type of stream.