Massive GET /status.php requests

Our ownCloud has around 600 users. At certain time (noon) everyday, it crashes.
I have been analyzing the /server-status information from apache status module and noticed that the request for status.php started to pile up with "W" for Sending Reply.
The resources and bandwidth seem fine(htop, iotop). The MySQL database shows a lot of sleeping processes when the status.php requests start to pile up. I have increased MaxRequestWorkers and mysql max_connections many many times and it still keep hitting the max and I started to think that there's bug within the ownCloud application.
Here are screenshots of the server: http://imgur.com/a/7KnjU
Any guide on how to troubleshoot this issue is greatly appreciated.
Please let me know if more information is needed.

Thank you so much!

Steps to reproduce
1. Do not know how to reproduce. It happens at certain time every day.

Expected behaviour
ownCloud server is expected to handle all the requests with no problem.

Actual behaviour
The GET /status.php requests piles up and crash apache2 (server reached MaxRequestWorkers).

Server configuration
Operating system: Ubuntu 14.04
Web server: Apache2 2.4.7
Database: MySQL 5.7.17-0ubuntu0.16.04 (Separate Server)
PHP version: 5.5.9-1ubuntu4.20
OpenSSL version: 1.0.1f
ownCloud version (see ownCloud admin page): 9.1.3
Updated from an older ownCloud or fresh install: Updated from an older ownCloud.
Special configuration (external storage, external authentication, reverse proxy, server-side-encryption): Redis

ownCloud log (data/owncloud.log)

Please paste possible errors in the following code block, see https://central.owncloud.org/t/how-to-find-webserver-or-oc-logfile-enable-php-logfile/808 for more info
Here is the errors before apache2 crashes.

 [mpm_prefork:notice] [pid 21035] AH00169: caught SIGTERM, shutting down
[mpm_prefork:notice] [pid 2275] AH00163: Apache/2.4.7 (Ubuntu) PHP/5.5.9-1ubuntu4.20 OpenSSL/1.0.1f configured -- resuming normal operations
[core:notice] [pid 2275] AH00094: Command line: '/usr/sbin/apache2'
 [ssl:error] [pid 3380] [client 64.41.200.104:39208] AH02042: rejecting client initiated renegotiation
[:error] [pid 9882] [client 207.228.78.106:27263] PHP Fatal error:  Class 'OCA\\DAV\\Connector\\Sabre\\ExceptionLoggerPlugin' not found in /var/www/html/owncloud/remote.php on line 54
 [:error] [pid 9959] [client 198.254.194.84:64341] PHP Fatal error:  Class 'OCA\\DAV\\Connector\\Sabre\\ExceptionLoggerPlugin' not found in /var/www/html/owncloud/remote.php on line 54
 [:error] [pid 9936] [client 174.90.251.187:53598] PHP Fatal error:  Class 'OCA\\DAV\\Connector\\Sabre\\ExceptionLoggerPlugin' not found in /var/www/html/owncloud/remote.php on line 54
 [:error] [pid 9628] [client 10.9.3.190:56101] PHP Fatal error:  Class 'OCA\\DAV\\Connector\\Sabre\\ExceptionLoggerPlugin' not found in /var/www/html/owncloud/remote.php on line 54
 [:error] [pid 9996] [client 198.254.194.84:63397] PHP Fatal error:  Class 'OCA\\DAV\\Connector\\Sabre\\ExceptionLoggerPlugin' not found in /var/www/html/owncloud/remote.php on line 54
 [:error] [pid 9905] [client 207.228.78.57:6749] PHP Fatal error:  Class 'OCA\\DAV\\Connector\\Sabre\\ExceptionLoggerPlugin' not found in /var/www/html/owncloud/remote.php on line 54
[:error] [pid 9427] [client 10.9.3.136:62628] PHP Fatal error:  Class 'OCA\\DAV\\Connector\\Sabre\\ExceptionLoggerPlugin' not found in /var/www/html/owncloud/remote.php on line 54
 [:error] [pid 9977] [client 199.126.9.18:50411] PHP Fatal error:  Class 'OCA\\DAV\\Connector\\Sabre\\ExceptionLoggerPlugin' not found in /var/www/html/owncloud/remote.php on line 54
[:error] [pid 9956] [client 208.98.234.17:45985] PHP Fatal error:  Class 'OCA\\DAV\\Connector\\Sabre\\ExceptionLoggerPlugin' not found in /var/www/html/owncloud/remote.php on line 54
 [:error] [pid 9775] [client 198.254.194.84:58547] PHP Fatal error:  Class 'OCA\\DAV\\Connector\\Sabre\\ExceptionLoggerPlugin' not found in /var/www/html/owncloud/remote.php on line 54
[:error] [pid 10022] [client 184.151.222.118:7151] PHP Fatal error:  Class 'OCA\\DAV\\Connector\\Sabre\\ExceptionLoggerPlugin' not found in /var/www/html/owncloud/remote.php on line 54
 [ssl:error] [pid 9685] [client 64.41.200.102:58705] AH02042: rejecting client initiated renegotiation
 [core:notice] [pid 2275] AH00051: child pid 13976 exit signal Segmentation fault (11), possible coredump in /etc/apache2
[core:notice] [pid 2275] AH00051: child pid 14267 exit signal Segmentation fault (11), possible coredump in /etc/apache2
[mpm_prefork:error] [pid 14386] AH00161: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting

Integrity status for oC9+

Login as admin user into your ownCloud and access
http://example.com/index.php/settings/integrity/failed
No errors have been found.

That happened last year, too.

Does the file /owncloud/apps/dav/lib/Connector/Sabre/ExceptionLoggerPlugin.php exist?

Hi Skriesch, thank you for the response.

Yes I can confirm that the file ExceptionLoggerPlugin.php does exist in /var/www/html/owncloud/apps/dav/lib/Connector/Sabre.
Do you think this could be the culprit?

The only temporary solution right now is to restart apache2 everyday when it is about to hit the max request.

Maybe PHP runs out of memory somehow and fails to load that file when it happens ?

Did you check the access log to see what connection appears at noon ?
Do you have a cron job (cron.php) running at noon ?

I have PHP memory_limit set to 8GB so that is not an issue and plus i have never seen any PHP memory issue in the error log.
Access log before the Apache crashes:
[13/Feb/2017:12:01:00 -0700] "PROPFIND /remote.php/webdav/ HTTP/1.1" 500 1094 "-" "Mozilla/5.0 (Windows) mirall/2.1.1"
[13/Feb/2017:12:01:00 -0700] "PROPFIND /remote.php/webdav/ HTTP/1.1" 207 1317 "-" "Mozilla/5.0 (Windows) mirall/2.2.4 (build 6408)"
[13/Feb/2017:12:01:00 -0700] "PROPFIND /remote.php/webdav/ HTTP/1.1" 207 1318 "-" "Mozilla/5.0 (Windows) mirall/2.2.2 (build 6192)"
[13/Feb/2017:12:02:26 -0700] "GET /status.php HTTP/1.1" 200 3161 "-" "Mozilla/5.0 (Windows) mirall/2.2.4 (build 6408)"
[13/Feb/2017:12:02:26 -0700] "GET /status.php HTTP/1.1" 200 3161 "-" "Mozilla/5.0 (Windows) mirall/2.0.2"
[13/Feb/2017:12:02:26 -0700] "GET /status.php HTTP/1.1" 200 3161 "-" "Mozilla/5.0 (Windows) mirall/2.2.4 (build 6408)"
[13/Feb/2017:12:02:26 -0700] "GET /status.php HTTP/1.1" 200 3161 "-" "Mozilla/5.0 (Windows) mirall/2.2.4 (build 6408)"
[13/Feb/2017:12:02:26 -0700] "GET /status.php HTTP/1.1" 200 3161 "-" "Mozilla/5.0 (Windows) mirall/2.2.4 (build 6408)"
[13/Feb/2017:12:02:26 -0700] "GET /status.php HTTP/1.1" 200 3161 "-" "Mozilla/5.0 (Windows) mirall/2.1.0"
[13/Feb/2017:12:02:26 -0700] "GET /status.php HTTP/1.1" 200 3161 "-" "Mozilla/5.0 (Windows) mirall/2.2.4 (build 6408)"
[13/Feb/2017:12:02:26 -0700] "GET /status.php HTTP/1.1" 200 3161 "-" "Mozilla/5.0 (Windows) mirall/2.2.3 (build 6307)"
[13/Feb/2017:12:02:26 -0700] "PROPFIND /remote.php/webdav/ HTTP/1.1" 401 1102 "-" "Mozilla/5.0 (Windows) mirall/2.0.2"
[13/Feb/2017:12:02:26 -0700] "PROPFIND /remote.php/webdav/ HTTP/1.1" 401 1102 "-" "Mozilla/5.0 (Windows) mirall/2.1.0"
[13/Feb/2017:12:02:26 -0700] "GET /status.php HTTP/1.1" 200 3161 "-" "Mozilla/5.0 (Windows) mirall/2.2.4 (build 6408)"
[13/Feb/2017:12:02:26 -0700] "GET /status.php HTTP/1.1" 200 3161 "-" "Mozilla/5.0 (Windows) mirall/2.2.4 (build 6408)"
[13/Feb/2017:12:02:29 -0700] "GET /status.php HTTP/1.1" 200 3161 "-" "Mozilla/5.0 (Windows) mirall/2.2.4 (build 6408)"
[13/Feb/2017:12:02:27 -0700] "GET /status.php HTTP/1.1" 200 3161 "-" "Mozilla/5.0 (Windows) mirall/2.2.2 (build 6192)"
[13/Feb/2017:12:02:27 -0700] "GET /status.php HTTP/1.1" 200 3161 "-" "Mozilla/5.0 (Windows) mirall/2.1.1"
[13/Feb/2017:12:02:26 -0700] "GET /status.php HTTP/1.1" 200 3161 "-" "Mozilla/5.0 (Windows) mirall/2.2.4 (build 6408)"

There is no cron job other than owncloud cron job which runs every 5 minutes.

status.php, hmmm... Does your server have a lot of outgoing federated shares ?

Maybe cron jobs from other servers are checking if the fed shares still exist and are all calling status.php first to find out if your instance is up.

There were some fixes in OC 9.1.4: https://owncloud.org/changelog/#latest9.1

PVince81,
We dont use federated shares.

tflidd,
I will try update the ownCloud to 9.1.4 but I highly doubt it would fix the issue.

From the access log, it seems to be doing a lot of server checking:
"PROPFIND /remote.php/webdav/ HTTP/1.1" 207 1317 "-" "Mozilla/5.0 (Windows) mirall/2.1.1"

The weird thing is that it only doing this at certain time (noon).

New clue:
The Updates section of ownCloud client seems to be stuck with "Checking update server..." even after restarting the client a few times.

is there a way for me to troubleshoot owncloud client?

You can create a more detailed logfile: https://doc.owncloud.org/desktop/2.2/troubleshooting.html

Just to get everyone an update, it was actually caused by our backup job (Veeam + VMware Snapshots).
The backup freezes ownCloud VMs for ~8 seconds and caused high spiked on status.php requests during peak hours.

I guess my question now is, is there a way to fix this so that the requests dont get stuck for a long time, perhaps in php or apache2 configurations?

You can put ownCloud in maintenance mode during the snapshot. The snapshot solution freezes the system during 8 seconds?

1 Like

Hi tflidd,

I have been trying to do this but looks like the requests still pilling up. Put ownCloud on maintenance mode does not really do anything.

I'm wondering if there's anything else I could do from the apache/php side. Making changes on the VM/Veeam might be harder considering the infrastructure.

Thank you for all the pointers so far though.

What do you mean with "maintenance mode does not really do anything"? When you put oC in maintenance mode, the clients will pause till you put oC back into working mode. The web frontend will also say "system in maintenance mode" etc. ...

What I meant was, enabling the maintenance mode does not remove the current massive stalled requests. The stalled requests only go away after restarting the apache2 service.

Correct. But if you enable maintenance mode each time, prior to your backup those should not pile up?!

Can you try to shift your cronjob a bit, perhaps there are other tasks (could be one of the users as well).

I cannot put the maintenance mode on during the backup because the backup job start time and duration varies.
There is no other cronjob in the system other than for ownCloud.

For now, I have created a cronjob script to restart apache2 service a couple of times after the backup jobs finish.
This seems to be working good so far. This is a temporary solution until I get the infrastructure side figured out.