Extending search

Permalink
I am building a new site with 5.8.0.2, I have a need to index an ever growing collection of custom data that is used to form inter-relationships across the entire site. This data is stored in serialised records in the block tables. I already have written optimised data retrieval functions for the data. I need to provide or somehow extend search indexing so that these relationships can be indexed and thus searched. I have searched (no pun) for an api or a way to provide indexing hooks into custom index functions that can in turn return the relationships. If search doesnt provide this then I may have to write my own which I do not relish the thought of.
Can anyone point me in the right direction please?

Thanks
Martyn

FaganSystems
 
FaganSystems replied on at Permalink Best Answer Reply
FaganSystems
OK In case anyone finds this and has the same issue I have progressed with my understanding of how this works, both in terms of the Indexer and the search results being searched and rendered.

When an index job is executed it will cycle through all of the blocks on all of the active pages and call getSearchableContent() in the block controller. This function returns a single string of all of the indexable content in the block, anything you have that should be in the index can be concatenated and returned. THe results from the collection are written into the table PageSearchIndex for use in search.
So that answers my original question.
Having changed all of my code I then turned my attention to the search side. By following the code execution I can see how the user entered query is used to do a simple string search on the table and will find any single word or exact phrases as was returned from the indexer.
I will post a separate question for the next part but in brief I find the search far too basic for my needs and so will either try to use the Intelligent search found in the admin panel or write my own version of both the indexer and the search to handle my specific needs.
I hope this helps someone in the future. If I manage to build something then I will announce it in the forums
JohntheFish replied on at Permalink Reply
JohntheFish
My usual approach to this is a new list class inheriting from the original with a cleverer keywords filter. Typically splitting the keywords string on white space and adding a subexpression for each (so an 'and' relationship between them), or building a more complex sub-expression to create an 'or' relationship between keywords.

It can be taken further to 'or' each keyword between a series of columns and implement primitive stemming.

Its not as sophisticated as fulltext , but can be layered on top of any database item list derivative.
FaganSystems replied on at Permalink Reply
FaganSystems
Thanks for the reply John, in fact having spent most of the night working on an issue with a client States side, I also started to consider leveraging the strengths of MySql to improve the search. I had intended to try implementing the Match Against function in MySql, only to find that Pagelist already supports it. Because I have already implemented my own variant of the search box it was a simple matter to replace the call to filterByKeyWords to filterByFullTextKeywords. Now I can get results on any combination of keywords even if they are in the wrong order, also use + and - to improve the search by being able to say:

keyword1 +keyword2 +keyword3
Results must contain keyword2 and keyword3 and any with keyword1 will be returns higher up the results. All you need is MySql5.6 to make it work