Skip to main content

Backup Preparation

When integrating with tools like tar or rsync, find acts as an intelligent pre-filter, identifying exactly which files need to be backed up based on complex criteria.

1. Incremental Tar Backups

If you need to create a tar archive of files that have changed in the last 24 hours, find can feed the file list directly to tar.

We use -print0 in find and --null -T - in tar to safely pass the filenames regardless of spaces.

# Find files modified in the last 1 day and archive them
find /var/www/html -type f -mtime -1 -print0 | tar -czvf /backups/inc_backup_$(date +%F).tar.gz --null -T -

2. Generating Rsync Include Lists

rsync is powerful, but its internal filtering logic can be complex. Sometimes it is easier to use find to generate an exact list of files, and tell rsync to sync only those files.

# 1. Generate the list of critical config files
find /etc -type f \( -name "*.conf" -o -name "*.ini" \) > /tmp/backup-list.txt

# 2. Feed the list to rsync
rsync -avz --files-from=/tmp/backup-list.txt / remote_server:/backups/configs/

3. Finding Files Larger Than the Backup Threshold

If your cloud backup provider restricts individual file sizes (e.g., S3 limits or Glacier constraints), use find to locate offending files before the backup job fails.

# Locate and document files over 5GB
find /data -type f -size +5G -exec ls -lh {} \; > /var/log/large-files-audit.txt

4. Staging Data for Migration

When migrating servers, you often want to move the data but leave behind OS-generated artifacts, cache files, and old sessions.

# Move all PHP session files older than 2 days to a temporary trash dir
# (Instead of migrating them to the new server)
find /var/lib/php/sessions -type f -mtime +2 -execdir mv {} /tmp/session_trash/ \;