Character encoding issue in tag cloud results

Permalink
I have lots of "???" in my tag cloud search results where there should be extended characters (quotes, accents etc.) eg:
http://brandculture-se.com/index.php/blog/?akID[11][atSelectOptionI...
If you click through to a page you'll notice the same content is displayed without any problems. Even a regular search for the same term looks OK in the results. It's only mangled in the tag cloud search result. Any ideas?

jawbonelid
 
adajad replied on at Permalink Reply
adajad
I have the exact same problem, but I also get the same result if I do a search for a word with any extended character.

I have tried with editing the regex and also

if(!empty($matches[0])) {
    $body_length = 0;
    $body_string = array();
    foreach($matches[0] as $line) {
        $line = html_entity_decode($line, ENT_QUOTES, APP_CHARSET); //added decoding of html characters for readability (doesn't work)
        $body_length += strlen($line);
        $body_string[] = $this->highlightedMarkup($line, $highlight);
        if($body_length > 150)
            break;
        }
        if(!empty($body_string))
            return @implode("", $body_string); //removing dots in results. original: return @implode("....", $body_string);
}

...but no.

And before you ask, both APP_CHARSET and DB_CHARSET are set to UTF-8.

EDIT: And I also have the same problem (from now and then) with encoding in the dashboard when (I guess) search is used to display results as in breadcrumb, file manager, user groups etc.
adajad replied on at Permalink Reply
adajad
Is there no one who can give us a hint?
drewR replied on at Permalink Reply
drewR
Hi adajad,
It looks like you are using 5.4.2.2.
Tag Cloud seems to be working smoothly in 5.5.1.
I would suggest upgrading to 5.5.1.
adajad replied on at Permalink Reply 1 Attachment
adajad
Actually I'm on 5.5.0 (see attached), but I will upgrade to 5.5.1 and see if that solves anything.
adajad replied on at Permalink Reply 1 Attachment
adajad
Nope... 5.5.1 still gives me the same output. See attached example.
drewR replied on at Permalink Reply
drewR
I have added the Blog Tag Cloud issue to the bug tracker.
You can follow it here:
http://www.concrete5.org/developers/bugs/5-5-1/blog-tag-cloud-garbl...
drewR replied on at Permalink Reply
drewR
Hi Adajad,
Andrew has fixed the Tag cloud issue in our latest
development branch (which can be accessed on github) and will be
included in our next official release.

If you want to apply the patch to your site directly, you will have to modify two files.
You can find the patch on github:
https://github.com/concrete5/concrete5/commit/cfea5bfddd45f88743cea9...

This is the github commit. You could probably apply this commit
directly to your site with little worry that it might screw something
up.
Best,
drew
jawbonelid replied on at Permalink Reply
jawbonelid
this patch doesn't appear to solve my original problem (typographer's quotes)?

Joe
adajad replied on at Permalink Reply
adajad
Take a look at your database and make sure table btcontentlocal has the correct character set and collate. I was looking through my db and found that every special character was saved with html code instead of actual character, and when search fetches content the search controller strips certain characters which then scrambles your output.

So what I did: altered table btcontentlocal to use preferred character set and collate, and then edited my content blocks (edit and save without changes). Fortunately I didn't have more than about 60 content blocks.

This all comes down to me not using the correct character set when creating the db on initial install.
jawbonelid replied on at Permalink Reply
jawbonelid
Looks like my db has HTML entities instead of raw characters too. DB and tables are UTF8 though - that should be OK? If HTML encoded data is getting into the db that must be something to do with the HTML page that captured the data?

Joe
adajad replied on at Permalink Reply
adajad
Thanks