Index content beyond concrete5

Permalink
How would I add pages to my search / index that are not managed by concrete5 but exist in other areas of my website?

Is this even possible?

j3ns
 
Remo replied on at Permalink Reply
Remo
You can override the generate_sitemap.php and derive your own class from Concrete5_Job_GenerateSitemap. Requires a bit of PHP coding, but it's certainly possible.
Mainio replied on at Permalink Reply
Mainio
I think Remo's answer does not specifically answer the question but it's on the right track.

I also think this is the part that is very much lacking from concrete5. So, to answer the original question: this is not currently possible with concrete5 (if I understood the question correctly).

What I think this is about (from my own experience) is that e.g. if I build an integration to an external products database, I'd like to feed that data straight into the search index, although this data does not have any actual pages in concrete5. Then this data would show up in the search results and the links point to an URL where the page is generated dynamically from the external database.

This is somewhat simpler to do for the sitemap.xml that Remo is referring to but for the search index, I don't believe it is possible currently with concrete5.

The problem is that all the searching functionality expects the PageSearchIndex table to contain actual references to the concrete5's page objects. So, there is no way of feeding external data into that table (and have it work consistently with e.g. the search block) without adding pages for these into concrete5.

What I've found as the only option to build something like this that supports the automatic sitemap.xml generation as well as the search tools, is to create duplicate pages for this data into concrete5. That means just programmatically generating the pages with the given data for example within a custom job. I don't think there is currently any other option to achieve this. There are also problems in this approach (e.g. if the data needs to be up-to-date all 100% of the time) but often it's good enough.
j3ns replied on at Permalink Reply
j3ns
Absolutely correct. This is about integrating concrete5 with another eCommerce / PIM solution to combine a really intuitive cms with an existing (heavily customized) eCommerce system.

I had to learn that concrete5 depends on page ID for pretty much any aspect regarding sitemap, search, and internal linking so I will look into the workaround programmatically generating pages that are not accessible by the user but serve as a reference for all the features that depend on those IDs.

Can anyone point me to a plug-ins that might come handy to get this done or serve as a reference? I learnt about concrete5 only a few days ago and I am impressed by it's ease of use (especially when using the "designer content" plug-in).

:)
Mainio replied on at Permalink Best Answer Reply
Mainio
I don't believe there is any add-ons for that, these are always so specific problems that they are very hard to generalize.

Here's a guide on creating a page within concrete5 programmatically:
http://www.concrete5recipes.com/tricks/create-concrete5-page-progra...

And remember that you need to somehow map the pages with the external ids (e.g. product ID or whatever you need to link with in the external system) so that you can update the data later and you won't end up creating duplicate pages.

I've personally built it so that concrete5 still serves these pages but the content is fetched from the external system during the request and displayed within concrete5. It worked the best for that situation.

Here's also an interesting thread I found about the subject:
http://www.concrete5.org/community/forums/customizing_c5/programmat...

Especially the opinions of @jordanlev in that thread. :)

Well, I wouldn't personally look at it so negatively, although the integration we did only had about 200 pages per language with 7 different languages, so it wasn't like 10 000s of pages where you probably will end up having some additional difficulties.

One difficulty also related to this is the PHPs execution limit if you're planning to run these jobs through cron and wget. The 30s or 60s that is the usual limit will face you very quickly in something like this. You could also raise that for that specific request but then again, you're jamming up one httpd process to process your job request. This got the sysadmin quite confused in our situation and they thought something was wrong because there was one httpd process jammed every night (because the update took about 5-15 minutes every time).

Therefore, I'd strongly suggest running batch executing these kinds of jobs/scripts:
https://github.com/ahukkanen/c5_cmd...