Our backups on Puma have been taking a long time, finishing late in the day, almost in time for the next backup. The problem seemed to be in the cp -al step, not in the rsync step. I investigated, looking at how long the cp -al on each directory took, using code like this:
for a in `ls -A $from`; do now=`date +%T` echo "$now cp -al $from/$a to $to/$a" cp -al $from/$a $to/$a done
and the result looked like:
16:38:06 cp -al 2014-03-26/alice to today/alice 16:38:06 cp -al 2014-03-26/anderson to today/anderson 16:39:37 cp -al 2014-03-26/apache-tomcat-5.5.26 to today/apache-tomcat-5.5.26 16:39:42 cp -al 2014-03-26/appinvstats to today/appinvstats 22:59:14 cp -al 2014-03-26/appinv-stats to today/appinv-stats 22:59:14 cp -al 2014-03-26/btjaden to today/btjaden 23:11:11 cp -al 2014-03-26/compbio to today/compbio 23:11:12 cp -al 2014-03-26/cs to today/cs 23:11:24 cp -al 2014-03-26/cs110f11 to today/cs110f11 ...
It finished at about midnight (so, less than 8 hours total), but essentially all of that time was in the appinvstats directory.
Sure enough, some subdirectories of that account had a *lot* of inodes. Here are some useful references:
http://unixetc.co.uk/2012/05/20/large-directory-causes-ls-to-hang/
http://www.olark.com/spw/2011/08/you-can-list-a-directory-with-8-million-files-but-not-with-ls/
http://www.pronego.com/helpdesk/knowledgebase.php?article=59
A count of the inodes:
ls -fR collectedStats | wc -l 5057712 ls -fR errorFiles | wc -l 4438339
So, about 1 million files/folders or inodes.
Turning these in to tarfiles would reduce these to two inodes. There’s also a savings in space:
du -csh errorFiles errorFiles.tar 129G errorFiles 106G errorFiles.tar du -csh collectedStats collectedStats.tar 11G collectedStats 3.3G collectedStats.tar
We’ll look at replacing these directories with the tar files.