rsync and checksums

How to ensure identical copy

How rsync compares files?

By default when using rsync to synchronize files it uses the size and last modified date to decide if given file needs to be transferred or not. This method is appropriate in the most cases, as it is very fast and reliable enough.

The problem is when transfer error had happened. The file will have the proper size but the content of the copy will be different than the original.

Also, when the last modified date is not reliable, like in a case of some network volumes, the simple check may not be sufficient.

Reason for using checksum

When using checksums rsync will calculate a special value based on the whole content of the file. This will make it much less likely for transfer errors to stay undetected. It will synchronize the file if such sum differs between source and destination. Which means there’s a higher level of certainty that source and destination will contain identical copies.

The biggest disadvantage of using checksums is the speed. Instead of using very fast metadata check the whole file needs to be read and processed, and that’s just to check if the file needs to be copied over or not.

Although the checksum algorithm is very fast it’s not immediate. Especially if any of the mediums involved is slower, as it is a common case when using rsync for backups.

The rsync option

Actual usage of the checksum check with rsync is just a matter of adding -c or --checksum switch to the command. This will cause the switch from the normal method of checking for files being identical to one using checksum algorithm.

An example command:

$ rsync --archive --verbose --checksum src/ dst/