Back in 2012 I joined ownCloud to work on the search functionality. I had built a full text search prototype that used dbus to talk to trackerd and would build a PHP only solution based on zend_search_lucene known as the search_lucene app. I used the learnings to use elasticsearch to do the same thing for larger instances, where a PHP only solution would not scale. That became search_elastic. Not much to show for 6.5 years of work. What happened?
I really like solving problems, so I got dragged into a lot of consulting and support work and ended up reading tons of logs and debugging every aspect of ownCloud. We were never out of problems. Some, we were able to solve. ownCloud 10 really has come a long way!
But some problems are caused by more fundamental problems: storage layer architecture, request based code execution and a single language app framework.
Furthermore, price economics are going to reduce the number of small on premise deployments. They will either go to a hosted service or run their own serverless infrastructure. Both solutions will be used to run cloud native apps. Personally, I think that is where ownCloud development should focus on. If only to be a really useful example for maintaining a serverless application, but I digress: there are some …
Interesting problems that need to be solved!
1. Fragile storage layer architecture
To be able to link additional metadata like shares, comments and tags to files we store file metadata in the oc_filecache with a unique fileid. If the fileid for a file changes this additional metadata is lost. This will eg. happen if you move the file on disk and do a filescan. Don’t do it. Seriously, just because a table is called filecache does not mean it can be rebuilt without loss. The underlying problem is that we are trying to keep two sources of truth in sync: the filecache and the filesystem. And that has been known since the beginning.
Back in the days, it was decided that we need a database to store sharing information. We could have investigated extended attributes to integrate better with the underlying filesystem, however that was too far away from the LAMP stack everybody was used to. Times change and fortunately, some of the heavier ownCloud users worked around these problems and went a long way to have a single source of truth for file metadata. CERN even added mtime propagation to their EOS project and used it to completely replace the ownCloud storage layer using extensive reverse proxy rules … don’t ask … it is messy.
2. Request based code execution
The LAMP stack makes it hard to serve a lot of clients: if you keep too many connections open they will eat your ram, because every connection requires it’s own php process. That can be an apache process with mod_php or a php-fpm process. It does not matter. The request driven nature of PHP requires significant resources per client connection. If you add more RAM or more machines the threads will be waiting for db or cache queries most of the time because in PHP land network IO is blocking. So, any db query or redis query will block the thread. That is why projects like ReactPHP, AMP and swoole all reimplemented their own mysql and redis drivers. The PHP ecosystem is based on synchronous IO.
In contrast, the whole golang ecosystem is built with async IO in mind. It is a systems language, much better suited to keep open a million requests on a single machine, making real push notifications for clients possible. No more 30 sec polling. Instant push anyone?
3. A single language app framework
Today, anyone trying to develop an app for owncloud has to build it it PHP or he has to build a wrapper around whatever technology he uses to integrate it with ownCloud. The problem is that app developers are forced to integrate on the server layer when they should be able to integrate on the web layer.
CERN has started the CS3 apis as the next generation protocol that allows vendor neutral integration with a storage platform. Basically a gRPC based iteration of WebDAV. Well, that oversimplifies it, but you get the idea.
Interesting times ahead
I doubt I will get to work on search in the near future … well, actually implementing it on top of reva is one of the next steps for nexus.
Oh, you haven’t heard of nexus? Hm … that is a topic for another post. For now let me know what you think of the three problems I laid out above.