Old files do not get deleted


#1

I am starting a new thread since the old one about webcron problems is running long, but what caused me to start that other thread is still the core problem: my owncloud instance is filling up with old files (old versions of files and files in the trash).

I am running ownCloud 10.0.10 on a hosted server (without SSH access). I have these two lines in my config.php file:

  'versions_retention_obligation' => '1,2',
  'trashbin_retention_obligation' => '1,2',

From these, I was expecting file versions and trash files older than two days to be deleted by the cron job, but that is not happening.

webcron seems to run properly now (it is called from a 1&1 web service once a day; that is all they offer, no option for running things at 15 min intervals).

I don’t see anything special in the log file; I even added a log entry line to cron.php just to log whenever it is run and it seems to be running fine.

Any ideas what is preventing the old stuff from being deleted?


Webcron returns 400 status - cron fails to remove old files
#2

Hey,

i’m not sure if we should continue in Cron job for purging files_trashbin not working where we have discussed this topic. Maybe the suggestion here is still valid:


#3
  • Access rights on the cron job? Should be run as user www-data or similar.
  • Increase log-level if not yet done.

Consider switching to another webcron provider, eg easycron as suggested earlier.


#4

Seems hard to get the right balance between continuing one thread and reusing another…

Regarding that suggestion: I did reply on it, and at least the documentation says we need the quotes. Is that ‘=> 1,2’ even correct PHP syntax?


#5

This is a hosted domain, so all files have the same owner, and I cannot change any file ownership.
cron.php has permission level 604, so user +read +write, group nothing, public +read.

Also: cron.php is running, as I can see in the log output I added.

What log level is needed for cron output?

Edit: I just set the log level to the highest level, using the admin interface, but now the log file is drowned in entries related to “validateToken” – that is not terribly useful.

Why would that make any difference to the overall cron behavior, if it is run every 15 min instead of once in 24h? If webcron is run, I would expect it to do a “full” run, i.e. clean up everything that needs cleaning up. I don’t need this to be done every 15 min, this server is not used that intensely.

I must say this whole cron cleanup business is pretty opaque…


#6

Hey,

i think the question is how ownCloud is handling this internally. If i’m checking the config.sample.php all numbers are not surrounded by ‘’ where everything else (besides true/false) have ‘’ around them.

Why not just trying your look with something like:

‘trashbin_retention_obligation’ => 1,2,


#7

Those are probably just single numbers, right?

  • … because it’s incorrect syntax
  • … because the documentation says otherwise
  • … because I don’t know whether/where errors in config.php are reported
  • … because I do not know how to find out whether it is successful or not (other than waiting for two days)

BTW, I just realized that if you use the internal text editor of owncloud to edit a text file, you get dozens of file versions within a minute of typing, seems like it’s saving on each keypress!

text_edit_versions

This is pretty bad, especially if old versions do not get purged…!


#8

Hey,

so you probably have to wait for some one with more knowledge about ownCloud to answer here. :confused:

Maybe its even required to report a bug to the ownCloud development team.


#9

OK, I might be onto something here:

  1. It looks like a webcron job is only doing one single task at a time, i.e. it will remove one old file only – can anyone confirm this?

  2. I hacked my cron.php to change this.
    Original cron.php file (around iine ~140):

            // We call cron.php from some website
     	if ($appMode == 'cron') {
     		// Cron is cron :-P
     		OC_JSON::error(['data' => ['message' => 'Backgroundjobs are using system cron!']]);
     	} else {
     		// Work and success :-)
     		$jobList = \OC::$server->getJobList();
     		$job = $jobList->getNext();
     		if ($job != null) {
     			$job->execute($jobList, $logger);
     			$jobList->setLastJob($job);
     		}
     		OC_JSON::success();
     	}
    

My hacked version of this part:

		// We call cron.php from some website
		if ($appMode == 'cron') {
			// Cron is cron :-P
			OC_JSON::error(['data' => ['message' => 'Backgroundjobs are using system cron!']]);
		} else {
			$jobList = \OC::$server->getJobList();
			
			// We only ask for jobs for 1 minute because 1&1 webhosting will interrupt anything longer
			$endTime = \time() + 1 * 60;

			$executedJobs = [];
			while ($job = $jobList->getNext()) {
				if (isset($executedJobs[$job->getId()])) {
					$jobList->unlockJob($job);
					break;
				}

				$job->execute($jobList, $logger);
				// clean up after unclean jobs
				\OC_Util::tearDownFS();

				$jobList->setLastJob($job);
				$executedJobs[$job->getId()] = true;
				unset($job);

				if (\time() > $endTime) {
					\OC::$server->getLogger()->warning('CRON TIME OUT');
					break;
				}
			}
			OC_JSON::success();
		}

So this is supposed to run for up to one minute and process as many cron tasks as possible in that time.

From what I can see so far, all the old stuff is gone on my system. Yay!


#10

Hey,

it seems the documentation on background jobs confirms this:

Secondly, while Webcron is better than Ajax, it too has limitations. For example, running Webcron will only remove a single item from the job queue, not all of them. Cron, however, will clear the entire queue.

where it is highlighted that “Cron” should be used where possible:

It’s for this reason that we encourage you to use Cron — if at all possible.


#11

Thanks! I missed that webcron also only does one job at a time - which I fail to really understand the reason for. I do understand that AJAX can only done thing at a time, otherwise pages would be too slow. But webcron is only run once in a while, so it could easily do more than just a single job. With the recommended webcron interval of 15 minutes, there is a total of 96 jobs it can run in 24h. Not a lot if you consider that the text editor in owncloud creates dozens of versions while typing that need to be deleted at one point.

My hack seems to be a lot more reasonable: run webcron only once a day, but then allow it do process as many jobs as fit into one minute of processing. A bit like garbage collection. So far, this seems to be working just fine (and 1&1 will not let me run webcron at 15 min intervals anyway).

I would do that but the problem is that the proper cron command (or equivalent) is not available for me. Only a webcron interface.


#12

Hey,

i’m not sure but maybe there is a reason for this current web cron behavior and you just have luck that it works now with your modification?

I had read some discussions in the past where some one suggested to drop the support for such environments without access to the console / command line.