How to delete files, not existed in server database? (server 9.1.4)


#1

Some day ago while connecting 3rd disk to server, sdb and sdc drives are switched. And "greedy"
rsync backup hdd with deleted files become "main" cloud storage.
At night "occ file:scan" runned by Cron, and old deleted files synced to users.
At morning I give many cry from users and restore cloud database from yesterday. But files still exists physically on both drives (backup and main). Now I can't use "occ file:scan" and users have many RED errors about case, non-existed files, etc...
How to fix it, not searching every "old" deleted file by hands?
Or export filelist with full paths from database?


#2

OK, I do it myself on weekend.
1st, search files with find command:

find /media/backup/owncloud/data -mindepth 1 -type f -newerct '2017-11-01 02:08' ! -newerct '2017-11-01 04:55' -printf "\"%p\" \n" > /tmp/todel.csv

, where night time - when rsync works and create files with such inode time. You can see it with stat filename.txt
2nd, export all existed files in database (1.7 millions +) to existed.csv with phpmyadmin:

SELECT oc_storages.id, oc_filecache.path FROM oc_filecache INNER JOIN oc_storages ON oc_storages.numeric_id=oc_filecache.storage

3rd, convert encoding in existed.csv to UTF-8 / LF with vim (you can use Geany, but sometimes it crashes on such big files).
4rd, replace strings like "home::aleksey";"files/Documents/Example.odt" to "/media/backup/owncloud/data/aleksey/files/Documents/Example.odt" also with vim.
5rd, do todel.csv minus existed.csv with file minus.rb from here: https://forum.antichat.ru/threads/vychitalka-strok.162871/ and have result.csv. Script give me some PIG. Now ruby wants to the very top of the rb file: # encoding: utf-8 . Some years ago it works without it.
6th, Delete files from result.csv:

#!/bin/bash
date
echo "Deleting..."
while read filename; do
rm "$filename"
done < /tmp/todel2.csv
echo "Done!"
date

Now I will not use "greedy" rsync backup, only with --delete option. Instead of this I will create LVM or versioning FS for backup volumes on servers. May be it will be ZFS.
Think this can help somebody to operate with huge massive of data.