Utf8Num analyzer needs PCRE unicode

Permalink
When I try to execute a search I receive the following error:

Unable to complete search: Utf8Num analyzer needs PCRE unicode support to be enabled

Also when I Index the Search Engine under the Maintenance panel I get the message "failed to open stream: No such file or directory" under the Results of the Last Run column. Could the problem be that my server uses php 5.1.6 instead of php 5.2? When I run phpinfo I see that PCRE support is enabled.

 
vercasson replied on at Permalink Reply
vercasson
Look for "-with-pcre-regex..." within your phpinfo page. Many times PHP is configured with this flag, that messes up PCRE unicode support.

This was the case with my server. I compiled php without the flag and everything worked fine.

The flag "--enable-utf8" helps also
liquidfuse replied on at Permalink Reply
.
liquidfuse replied on at Permalink Reply
Deleted
vercasson replied on at Permalink Reply
vercasson
Yes, my research led me to recompile PHP.
chestercat replied on at Permalink Reply
After speaking to the people who manage our server they say that custom PHP builds are not supported with their hosting software. Is there any other way to get the search to work without recompiling php or will I need to use a different server?
vercasson replied on at Permalink Reply
vercasson
Maybe one of the C5 guys has a workaround? Good luck.
andrew replied on at Permalink Reply
andrew
This is a shot in the dark, since I've never been on a server that's had this problem, but...

We're using the Zend Framework's lucene search indexer implementation, and I think if you use one other than the unicode you won't need this.

Check out concrete/libraries/indexed_search.php

Anywhere you see

Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8Num_CaseInsensitive


change it to

Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive


You will also probably have to modify blocks/search/controller.php as well.
chestercat replied on at Permalink Reply
I changed the code in the two files and it got rid of the Utf8Num Analyer needs PCRE Unicode error message but now I receive the message "Unable to complete search: failed to open stream: No such file or directory". I get the same message when I try to reindex the search engine in the dashboard.
Remo replied on at Permalink Reply
Remo
this doesn't change a lot.

I debugged a few things and here are a few hints:

- indexed_search: for some pages I had two characters for an umlaut. I could fix that with utf8_decode($this->getBodyContentFromPage($c))

- query string: the keyword I'm searching for needs to be encoded as well firefox passes it like "%F6ffentliche" this needs utf8_encode($q) in the controller

I my case, everything looks like proper utf8 text now but it still doesn't work! Lucene really bugs me, will try to find the problem after Christmas..
Remo replied on at Permalink Reply 1 Attachment
Remo
Hey Andrew,

I'm a bit late for Christmas but I fixed almost all the problems with lucene..

Looks like the right indexer isn't enough. It always truncated my text if there was a "character" like "ü".
(instead of "Gründung" the index contained only "Gr")

I was able to fix this problem with the patch I've attached.
Please note that you probably also have to set entities to raw in the tinymce configuration (entities: "raw") otherwise the index contains html entities like ü and not the proper unicode character ü!

It should work, but be careful with the charset.. ISO-8859-1 is not UTF-8 and as soon as someone uses my patch, this might cause some problems....



UPDATE: According to the documentation you could also use: setlocale(LC_ALL, 'de_DE.iso-8859-1') but I'm not a big fan of setlocale since it doesn't work on all the server outthere... It also works doesn't work per thread but rather per process which might change the locale even if you don't call setlocale just because another scripts within the same process called it...