Full text search with OCR

Hi there,

I would like to have a full text search available on my owncloud box.
I already installed elasticsearch and added the URL to owncloud’s config.
The search seems to work pretty fine now.

In addition to the already existing full text search, I would like to have some OCR scanning so that scanned PDF files are automatically parsed for text which would then also be searchable or indexed.
Is it possible to add OCR to the full text search?

Best regards,
TomS

Hey,

i’m not sure if there is a strict relation between OCR and full text search. AFAIK if there would be an OCR app for ownCloud (i don’t think such app exists) the full text search would automatically pick up documents scanned via OCR if the scanned documents would have been placed into ownCloud or am i wrong?

No idea if it’s going to work, but you can try https://stackoverflow.com/questions/33307541/configure-elasticsearch-attachment-mapper-to-use-ocr-plugin

Basically, ownCloud will send the file to elasticsearch for indexing (no changes in the current ownCloud’s behaviour) and it will be elasticsearch the one using OCR to extract the information and index it properly.
As far as I know, it should works as long as elasticsearch is capable of working with the OCR.

Note that this solution is a matter of setting up elasticsearch correctly. Once it’s elasticsearch is configured correctly, it should work from ownCloud without any change.

2 Likes