Make content in single pages searchable

Permalink 1 user found helpful
Hi,

Can anyone advise on how to include content from single pages in the C5 search index?

I have a single page which reads content from a database of job listings, so I need my single page to be able to somehow return results to the search.

thanks in advance.

Sadu
 
slafleche replied on at Permalink Reply
slafleche
Greetings,
The way concrete searches blocks is in the controller, there's a function called getSearchableContent(), which returns a searchable string. Example:

public function getSearchableContent() {
$content = array();
$content[0] = $this->field_1;
$content[1] = $this->field_2;
$content[2] = $this->field_3;
return implode(' - ', $content);
}

I'm not sure if it'll work for single pages, but you can have a controller for it. If i remember correctly, you need a file with the same name in your theme root...

If that doesn't work, i have another idea. It's a hack, so the first option would be preferable. You could make a page that does a php include of your single page, but exclude it from nav, page lists and sitemap. You could then put a redirect on it to your desired page.

These are untested solutions, but hopefully it helps out.
Sadu replied on at Permalink Reply
Sadu
Thanks for the response, though I'm not sure either of these approaches will help in this instance.

The fundamental issue is that I am using a single page to represent hundreds of actual pages.

Eg...

example.com/jobs/ is the single page - this is a job index page.

example.com/jobs/123/ is a job detail page. It passes the Job ID (123) to the single page controller, and does a database lookup to get the job data. This page has information about (say) a web developer job. And I'd like the example.com/jobs/123/ page to appear in the search results if someone searches for "Web developer" in the search box.

What I need is for my page to be included several times in the search index database - once for each job. this doesn't seem to be possible as the primary key for the PageSearchIndex table is the cID. In this case, all my job pages have the same cID so this is a problem.

In this case, I don't think any kittens will die as a result of jobs not being searchable, but it's something to keep in mind. If you need a single page to contain searchable content, then a single page may not be the best way to go. It may be better to create each job as it's own page in Concrete5, with it's own cID and all that good stuff.

Alternatively, I can see the possibilility to change the PageSearchIndex table to be indexed by cID AND cPath so you can have multiple entries in the search index per single page. I'm not sure how difficult this would be, but it would almost certainly require core changes I would think. I may look into this further if the client squawks!

And also note that none of this affects the page's performance in Google - Google can index the site just fine so maybe it's easier to not use the C5 search and install a Google search instead.
slafleche replied on at Permalink Reply
slafleche
Yes, each of your jobs needs it's own page to be indexed properly by concrete. You would use that function i have above to make its contents indexed by concrete.

However, if everything's in a database, you might want to make a custom search. You could either do it with SQL, google, or something like Solr.

I've tried to use the google site search and you're kind of limited.
Solr's a little complicated to set up at first, but it's really powerful and lets you do anything you want. It is not a web crawler (although you can pair it with one). It only searches files, but you have total control over what it searches and how. So, you could have a job that runs however often you need it to generate an XML or JSON file on your server with all the data you want searchable. Then you make a query to the Solr application with get parameters and it shoots back XML, JSON and i think possibly one or two other options. Check it out, it's open source.
infiniteuptime replied on at Permalink Reply
I just finished a document publishing application for our university department that maintains its own sitemap file for search engines, but it would be a huge help if concrete5 could somehow recognize a single page multiple times, with different parameters for each instance.

Even if there was just one concrete5 search result that passed the search string when you clicked on it, that would be functional enough for me. I could then include all of the searchable content to the single page and have the listing interpret the search results for itself. The concrete5 search would present a result for "Search the EECS Library", and clicking on it would show detailed results for documents that have been published.

I just haven't figured out what would need to be changed in the core to make this possible. I will get back to this thread when I find out more, unless someone else already has an idea.
slafleche replied on at Permalink Reply
slafleche
I don't think that's possible with concrete's search.

However, if you were to have a unique URL for each result, it would be possible with any other crawler. (Making the url unique with GET variables won't work. It'll still be seen by the crawler as the same page)

For your unique urls, you might be able to do it javascript. Let's say you have a search page, you could with an AJAX call, change the contents of the page. Then you can modify a url without a refresh:

window.history.pushState(“object or string”, “Title”,/new-url”);


You would also need to be able to load the correct "page" if you were to type one of the urls of course. Then the webcrawler should be able to find them.
infiniteuptime replied on at Permalink Reply
As I mentioned in my above message, the pages already work perfectly with external crawlers/engines, but I'm referring to the internal search.

I also realize that it's not currently possible with concrete5's search, and that's why I'm talking about adding to the core. If we can find a solution that only needs the slightest change to the core, I will try to get that included in a new release. That way, it will be possible in the future without hacking away at our installations.

I also have some minor changes that would allow external authentication systems to be integrated very easily, but I need to work further on that before it's ready.