Introduction
My wife has lost a harddrive or two on her computer and now compulsively backs everything up to the server, which is great. However, over time, she has ended up with up to 4 copies of the same file stored in different places. I had of course asked my wife to clean things up but she is so busy she often doesn't keep up with her emails and the idea of spending a Sunday afternoon cleaning up files on the computer just didn't appeal to her. Enter fdupes...
FDupes
fdupes is a command line program that finds duplicate files either in the same directory or recursively in the sub folders, which is handy. It also has the option to delete them or create hard links to the files in order to save space. Keep reading for some examples of this in action.
Examples
Basic way to find the duplicates. The -r flag tells fdupes to recursively include all subfolders for files to compare.
[tethys]:/home/sarah> sudo fdupes -r . ./BACKUP OF LAPTOP/Fonts/RAGE.TTF ./Clients/SVdP/Web Designs/RAGE.TTF ./BACKUP OF LAPTOP/Fonts/EccentricStd.otf ./Clients/BarbaraGraham/webdev/EccentricStd.otf ./BACKUP OF LAPTOP/Fonts/ACaslonPro-Regular.otf ./Clients/Thrasher/ACaslonPro-Regular.otf ./BACKUP OF LAPTOP/Fonts/ACaslon0.otf ...
If you want to actually delete the duplicates and keep the first one. you can use the below command:
sudo fdupes -rdN .
If you want to make hardlinks so that your spouse won't hate you when she can't find her files, then add the -L flag. This command doesn't apply to all versions of fupes so your mileage may vary. The patch available to do this is available here
I also found this code snippet that will work for those without the patched version:
fdupes -r -1 path | while read line; do j="0"; for file in ${line[*]}; do if [ "$j" == "0" ]; then j="1"; else ln -f ${line// .*/} $file; fi; done; done
And tada, cleaned up mess of many backups and files still where they are expected.