Reconsider the current caching functionality

Permalink
I talked to Remo about a better caching for concrete5.

I want to post a approach and a idea about how we can do it "better"! ;-)

Maybe Remo can also post his experiences and ideas here.

Let me know what you think. Hope to hear from you soon.


=== APPROACH: the PHP way ===
on our example pretty URLs are active. download the whole website (only .html files) with wget in a subdirectory called "cache" with ".htaccess" in it including "deny from all". The wget stuff can be done hourly in a cron job for example.

wget --recursive --no-host-directories --accept=html --exclude-directories=/files/,/index.php/ http://yourwebsite/


we replaced the content of the "index.php" in the root folder with something like this:

<?php 
$filename = './cache/' . $_SERVER['REQUEST_URI'] . 'index.html';
if (file_exists($filename)) {
   $x = file_get_contents($filename);
   echo $x;
   exit;
}
require('concrete/dispatcher.php');


Don't forget this is just an approach. Correctness of REQUEST_URI is not checked, etc. etc.!

speeds up everything because if a cached version of a site is avaible only the cached version is send out and concrete5 is not loaded in memory -> which means also no database querys etc. etc.

So this solution does not work with dynamic blocks (forum, guestbook etc. for example). But it would be possible to check if the request is a POST request and then it should not load the cached file and let concrete5 (dispatcher.php) do everything, etc. etc.


=== ANOTHER IDEA: mod_rewrite and stuff ===
for this idea we also keep a cached version of the pages in a subdirectory. If there is a request, we check with mod_rewrite, if the page exists as html file. If it is so, we send out that file. If not we let "index.php" (concrete5) do all the stuff.

so how can we keep the cached files up-to-date? one idea would be (like in the approach above) download the whole website (in a cron job for example). another idea is to save the whole page to the cache file, if the page gets edited in edit mode and saved (maybe this can be solved with the concrete5 events).

anyway we have to check with rewrite conditions, if there is a POST request to the server for example and then let the concrete5 index.php handle the request.


=== PROBLEMS ===
For both solutions we have to know which page has only static blocks (content blocks, flash blocks etc. for example). If a page has a guestbook or a forum it is not cacheable and so it should be handled via "index.php" and should not be cached.

Maybe it's possible to tag the blocks with something like "cacheable = true/false". If a page has a block with "cacheable = false" the whole page is not cacheable, so it is not written to the cache file and not handled via the PHP script (see approach) or mod_rewrite (see idea).

scalait
 
scalait replied on at Permalink Reply
scalait
wget --recursive --no-host-directories --accept=html --exclude-directories=/files/,/index.php/ http:[SLASH][SLASH]yourwebsite[SLASH]


replace [SLASH] with "/"
Remo replied on at Permalink Reply
Remo
This is what the cache extension I had in mind does:

If a page changes (on_page_update etc. -http://www.concrete5.org/help/building_with_concrete5/developers/mv...
the extension should check what kind of blocks on the page are used. There are certain blocks (content, image, file) which always have the same output. Some blocks (form) generate dynamic output (the token for example) and can't be cached due to this fact.

If only "static" blocks where found, the page could be definied as "cachable". The output of the page could then be saved in a static file structure.

This page:
/team/person_a

would generate this file:
/cache/team/person_a/index.html
or:
/cache/team/person_a.html


mod_rewrite could then be used to display the static file instead of calling concrete5 to generate the page..

A bit easier but also a bit slower would be a more advanced dispatcher which checks for these staticly generated files...

This cache would improve the performance of concrete5 a lot for certain pages and is not too difficult to build.
andrew replied on at Permalink Reply
andrew
We've done full content caches before, where the entirety of the HTML is saved and then delivered (with dramatic speedup, as you might imagine) but never dependent on blocks. You're right though, there are plenty of blocks which you wouldn't want to cache. You would also likely have to disable the cache for any logged-in users.

Let's keep pondering on this. Further approaches? in 5.3.3 we already have the search indexing for pages automatically updated when a page is approved, so it'd be pretty trivial to add another check to that process which generated a full content cached page (if that were enabled in sitewide settings, of course.)

Additionally it'd be pretty easy to build this as an addon using concrete5 events, which might be a nice, low-impact way for someone to build it and for everyone to test it out.
Remo replied on at Permalink Reply
Remo
I never found a way to enable global events within a package?

What did I miss?
Remo replied on at Permalink Reply
Remo
I talked to Marc about this on the phone as well..

I'm currently doing an IP check but that's not going to work for everyone.

mod_rewrite is probably a bit tricky to use because we would have to check some cookie/session stuff.

Doing it in index.php would be easier and would allow us to check for a session.
scalait replied on at Permalink Reply
scalait
maybe it's a idea to add a query string to the URL if a user is logged in (for examplehttp://yourwebsite/about/?loggedin)...

then we can check in a very easy way via mod_rewrite whether, we want cached content or content handled via concrete5.

or is it possible in anyway to check via mod_rewrite, if the user is logged in or not? can we read the content of a cookie/session and check that via mod_rewrite?
Remo replied on at Permalink Reply
Remo
We should probably first decide whether mod_rewrite is the way to go or not..

mod_rewrite
+ faster
- needs a different .htaccess file
- might need core change for logged in check

dispatcher
- slower
+ easy session check
+ doesn't need a .htaccess modification
+ easier to pack into an addon (question about events above?)

anything I've missed?
okhayat replied on at Permalink Reply
okhayat
What ever you're talking about here, caching as far as I understood, but don't tie it to specific technology/software.
I'm using Cherokee, which doesn't support .htaccess. Some other might be using lighttpd or nginx..etc.

Just my 2c.
Remo replied on at Permalink Reply
Remo
I agree.. That's one more point to add to the list:

mod_rewrite
+ faster
- needs a different .htaccess file
- might need core change for logged in check
- not supported by all webservers

dispatcher
- slower
+ easy session check
+ doesn't need a .htaccess modification
+ easier to pack into an addon (question about events above?)
scalait replied on at Permalink Reply
scalait
as far as I no cherokee supports mod_rewrite. But I think we can not keep all those 08/15 webservers in mind...

Anyway we already use mod_rewrite in concrete5.
okhayat replied on at Permalink Reply
okhayat
Cherokee has a 'Redirection' handler, which can use Regular Expressions to rewrite the URL.
So, whatever rules you write in the .htaccess file, has to be done in the config file or using the web interface.
There is a blog post that shows how to do it:
http://blog.walterebert.com/using-pretty-urls-in-concrete5-with-che...
scalait replied on at Permalink Reply
scalait
Why not do both solutions? For people which need spped = mod_rewrite

For all others, or people which can't use mod_rewrite etc. etc. = PHP solution
defunct replied on at Permalink Reply
defunct
Andrew, is there a tutorial somehere that explains how one could do a full page cache for concrete5?

For a lot sites where there isn't much dynamic content this would be awesome.

Please let me know :)

Thanks,

Justin
mose replied on at Permalink Reply
mose
Full page caching is not available in Concrete5 at this time. A module has been written as a proof of concept (and others have developed a customized solution), but in order for page caching to be more efficient, it needs to be incorporated into the core. Various solutions are currently being explored, but it probably won't be available at least until the next major release of Concrete5.
scalait replied on at Permalink Reply
scalait
Hello Remo!

Can you upload the things you've already done. Your PHP code, rewrite settings etc.?


Best wishes,

Marc
axelhahn replied on at Permalink Reply
axelhahn
Hi there,

I am completely new to to Concrete 5 (I just avaluate it) and have some know how in caching on a hi traffic site.

One current problem I see is that each request to a page cannot be stored in
the browser cache. Because of these response headers
Cache-Control   no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma   no-cache

The client always asks the webserver for the content - even if the user uses the
back-key in the webbrowser.

I think the way is the one with the dispatcher. It has to know that a request
is cachable or not. I can give you a zip with a kind of

--- A request is not cachable:
The dispatcher ignores cache content and always deliveres dynamic response if:
(OR- condition)
- Request method is not GET
- requested url has a "?"
- requested url is "/concrete" or ... (whatever - define it in a regex)

Things that could require a dynamic page too (but I cannot handle yet because of my limited
knowedge) are:
- dynamic blocks on a page
- inactive page


--- Requests to that static page:
We have static content to deliver if none of the conditions for dynamic requests are true.
The cached content will/ can be delivered if (AND condition)
- cachefile already exists
- last modification file of cms exists
- The last modification in cms is older than timestamp of cachefile (+hit dirty period)
If not the first user is the "loser" that has to wait for generation/ update of static cachefile.
But all the next requests will profit.

Static Content is delivered with header 
Cache-Control   max-age=X [s]


Additionally:
Low traffic pages can profit from that cache
If you touch a file inside the CMS if a user stores a file or
modifies content, like:
- changes in blocks on a page (content changes, position changes)
- file upoads
- changes in scrap books

ah_dispatch.php is in download below...
http://axel-hahn.de/axel/download/ah_dispatch_.zip...

(instructions for changes in concrete/dispatcher.php - in the readme file)

Axel
Remo replied on at Permalink Reply
Remo
good stuff Axel!

Nice to know there's some c5 knowledge in Bern!

Marc, sorry for my late answer. I was/am very busy and missed your message.

I'll check the dispatcher addon axel uploaded and see whether I can include my code to detect dynamic pages automatically.
scalait replied on at Permalink Reply
scalait
Any news here? If we really could get this to work we can use reverse proxy (mod_proxy) or servers like Vanish (http://varnish.projects.linpro.no/) to serve our customer sites very quick.
Remo replied on at Permalink Reply
Remo
no, unfortunately not.

I played around a bit but found several small issues.

First one which you guys might be able to help. You couldn't render a View instance into a string, it uses ob_start and ob_getcontent and hooking into that stuff to get a string instead of output didn't work well. Have to admint I haven't tried everything.

I could also use curl, file_get_contents or fsockopen to get the html content from a page but for a package I'd like to release it doesn't seem to be a good choice.

So question is: Can I render a page into a string without using any additional tcp connection?
axelhahn replied on at Permalink Reply
axelhahn
Hi Remo,
my dispatcher works with a file_get_contents() and opens the requested with url param ?create=1. This request runs throug the same code. Thats why my dispatcher handles these urls with ? and quits. If you would use a similiar mechanism with any function using tcp (file_get_contents, curl, whatever) the dispatcher must be able to know what is a request that was initiated to generate a cached file. I used a url param but could be done with a http header variable too.

To your question not to use tcp. I only know the possibility with output control functions u already tried:
http://www.php.net/manual/en/ref.outcontrol.php...

EDIT:
I was playing with an example on php.net.
http://www.php.net/manual/de/function.ob-start.php#94193...

A working example to play with ...
original index.php in webroot
<?php 
require('concrete/dispatcher.php');


replace it with:
<?php 
function ob_handler($string, $flags) {
        static $input = array();
        if ( $flags & PHP_OUTPUT_HANDLER_END ) {
            $flags_sent[] = "PHP_OUTPUT_HANDLER_END";
         $input[] = implode(' | ', $flags_sent) . " ($flags): $string<br />";
         $output = $string;
         // to see headers $output=implode("<br />\n", headers_list()); 
         // TODO: 
         // if cacheable / not logged on -> override headers to store in browser cache
         header("Pragma: ");
         header("Expires: ");
         header("Cache-Control: max-age=300s"); // 5 min
        }
        return $output;


What it does: if OUTPUT_HANDLER_END is reached it does something... here it replaces the header (that must be sent before any response body) to enable local cache. At that point you i.e. can check the header variables and enable a server side cache if it is static.

It is just an example - problem with the header: if you log in after first cached request your browser gives you content from a local cache and you cannot edit it without a refresh.

EDIT2:
just wait a bit I will add it to my dispatcher. I attach the files if it is ready.
axelhahn replied on at Permalink Reply 1 Attachment
axelhahn
Hi Remo,

here is a version that captures content with output buffer instead of a 2nd request.
I tested with c5.331 - in the zip is the original dispatcher as a copy, the modified one and my files. Extract them to webroot/concrete.

Processing starts in the lower in ah_dispatch.php (search for // MAIN comment).

It runs fine with my site (static content) and handles CMS actions. If a user is logged in I touch a lastmodification file which is my flag. Cachefiles older lastmodification will be ignored and renewed.

Comparison:
Using c5.331-dispatcher I have 3 requests per page (page [400-500ms] + main.css+typography.css) ~800ms; with my page cache I have 1 request to the page ~80ms.

At each request end ob_handler() will be reached. Here I handle the writing of the cachefiles.
I removed the global maxage for cached content. I store an expire date next to the response body for each request. This allows set specific expire dates - so some pages can be static static for a few days and other pages with dynamic blocks (with rss feeds, ... whatever) expire after a couple of minutes. It is set with iMaxAgeOnServer. iMaxAgeOnServer=0 never stores a cachefile.
Can you check the TODO part inside ob_handler()?

Axel
Remo replied on at Permalink Reply
Remo
Thanks Axel,

I'll will look at it as soon as possible.

But modifying the dispatcher is not what I wanted. I think the page cache should be an addon and I therefore didn't want to modify any core code..

But I'll have a look at your code, maybe it's still the best way to improve the performance of c5.
okhayat replied on at Permalink Reply 1 Attachment
okhayat
I've tested the Memcache method, and seems to work fine.
File attached if you'd like to play with it.
- Rename to .php and copy it into concrete/libraries/cache/
- Add the following line to your config/site.php
<?php
define('CACHE_LIBRARY','memcache');
?>
- You need to have memcached, and php5-memcache installed. On Debian:
# apt-get install memcached php5-memcache
Remo replied on at Permalink Reply
Remo
nice!

But I think it might get useless with the next c5 version. It seems to use zend cache which supports apc, memcache, sqlite...

I haven't checked all the new stuff in subversion but it seems to be there
Remo replied on at Permalink Reply
Remo
Tony replied on at Permalink Reply
Tony
what if the HTML of certain blocks could be cached? Some blocks you wouldn't want this of course (forums, any that changes based on the user viewing it, etc). But for tons of other blocks like content, popup, youtube, googlemaps, html, image, file/details, etc, they are pretty much always static after you edit them.
Remo replied on at Permalink Reply
Remo
This is what I basically do, I don't create a cache for each block instance though. Just one per page..


Your approach would have the advantage that if a page contains several "dynamic blocks" it would still be partially cached.

But having lots of small caches is still a lot slower than having one cache file per page.


I think both approaches would be great, but different. If you have a simple homepage with a few autonav and content blocks my addon would improve the performance a lot whereas a per block cache wouldn't improve that much...

My approach can reduce the database calls to "0" for a lot of pages. If I'd be able to hook into better places at least.

----

1. I still like to keep my approach because of the reasons mentioned above
2. Having an attribute "btCacheOutput" in each block would be awesome too. Set it to true and the output gets cached
Remo replied on at Permalink Reply 1 Attachment
Remo
I've updated my cache addon.

It's not completely tested but it seems to work on my testsite so far..
tbcrowe replied on at Permalink Reply
tbcrowe
Thanks, Remo. I'm going to give it a try next chance I get. I'll let you know if I have any feedback.
defunct replied on at Permalink Reply
defunct
This breaks for me on 5.4.0.5

Fatal error: Call to a member function getCollectionObject() on a non-object in /home/xxxx/public_html/meepmeppe/packages/remo_cache/models/remocache.php on line 14
defunct replied on at Permalink Reply
defunct
Nevermind, found your googlecode page for this and am trying this one out.

http://code.google.com/p/concrete5-cache/...
mose replied on at Permalink Reply
mose
If you haven't seen this, already, you might be interested in it. I have posted my latest version of remo's code.

http://www.concrete5.org/community/forums/chat/cacheand039ing-andam...