Including external links in search index

Permalink 1 user found helpful
Hello,

Currently I'm trying to enable external links to show up in the search results, using the search block. I assume that the first problem is, that these are not being indexed yet.

However, I haven't been able to figure out where I can define that these should also be indexed. Apparently, somewhere a distinction is made between real pages and external links.

The situation is that we use external links a lot, to be able to link to catalogs that are outside our C5 installation, but on our hosting (these aren't files, but flippingbooks). As we're also able to define attributes on these links (using a modification described in some other thread here), this provides nice possibilities to includes these catalogs in the search results.

Can anybody point me in the right direction? Many thanks in advance!

 
Briann replied on at Permalink Reply
Has anybody got any idea where to look? Before I started the thread I already had a quite extensive look into it, but I haven't been able to figure it out yet.
pvb replied on at Permalink Reply
Looking for the same.
Briann replied on at Permalink Reply
I have been looking into this some more, but without any success yet.

In concrete/core/jobs/index_search_all.php is the following code (around line 36):

$db = Loader::db();
      $db->Execute('truncate table PageSearchIndex');
      $r = $db->Execute('select Pages.cID from Pages left join CollectionSearchIndexAttributes csia on Pages.cID = csia.cID where (ak_exclude_search_index is null or ak_exclude_search_index = 0) and cIsActive = 1');
      while ($row = $r->FetchRow()) {
         $q->send($row['cID']);
      }


But, in the CollectionSearchIndexAttributes table isn't any data of external links. Am I correct if I assume that this could be related to not indexing external links? As external links do not officially support attributes, it is logical though.

Furthermore, I noticed that the task shows the progress of 201 items (in my case), while 168 pages are indexed afterwards. This difference of 33 is exactly the number of external links...
Briann replied on at Permalink Reply
Had a look into it for some time, again.
The job updates the CollectionSearchIndexAttributes table itself, but only for pages. Apparently somewhere the external links are excluded (as suspected before).

The weirdest thing is, that the progress indicator starts with pages + external links (as mentioned in the previous message). But in the end, only pages are updated.
I cannot find the part of code that generates this progress indicator, to have a look at that.

I have also tested whether or not it made any difference if an external link had an attribute linked to it, that was set to be indexed. But it didn't.

Help is really appreciated, as I've run out of ideas by now!
Thanks in advance.
Briann replied on at Permalink Reply
Since I still need this, I had a look at it again.

I can see that core/jobs/index_search_all.php uses core/indexed_search.php.
Looking at the code however, it seems that it should use a list of all pages and index these. The external links are thrown out somewhere, but I can't find out where exactly.

I have already checked quite some things, such as the use of versions, approval, whether they are may be handled as system pages or are may be inactive, but these things all seem fine. The external link should act just like a page, judging from this.

I suspect that the external links are queued to index, but that the actual indexing does not take place. (This also clarifies the progress indicator.) After all, the query in index_search_all.php selects both pages as well as external links.

Probably, the problem is in core/libraries/indexed_search.php, either in the part where a page list is generated (around line 124) or in the actual reindexing part.

What do I overlook here? Help is much appreciated!
Briann replied on at Permalink Reply 2 Attachments
Yay!
I have finally found it. And it was surprisingly simple...

In concrete/core/models/collection.php on line 214, there is a check on aliases. (These should obviously not be indexed.)

What is not clear however, is that this is also a check for external links.
Therefore, isAlias() should be replaced by isRealAlias, which can be specified in page.php.

Since you should NEVER edit core files, we can nicely create two overrides. One for the change in collection.php and one for the new function isRealAlias. I have added these files (based on Concrete 5.6.3.3) to this post. The locations are indicated in the files. (Be sure to rename the files to .php.)

Hope this is useful for somebody!