Best practices on translating dynamic content...

Permalink 4 users found helpful
In the interest of having consensus on a topic that is important but spread all over the place, I'm starting this thread.

Short version:
Should we wrap things that could create site-specific dynamic content in the t() function?
The euro gang wants this because they need to translate the values for custom attributes or group names.

That seems like a reasonable need.

Our concerns:
1) There's a performance hit with that.. Is it measurable or not really? Leaning towards not really but I feel like it's important we pay a passing interest to that.

2) There's a user experience question that raises. Now the MO file that one generates from a PO at transifex is no longer good enough to offer the sites UI in German and English. You will need some basic ability to add those strings on the fly within your site. There's some add-ons out there for this, but isn't it kinda weird to have to go to yet another place to do that? If i'm adding a tag to a blog post, i add it in a language - where do i goto translate that.. seems painful.

3) Is there /any/ other way to approach the problem? Probably not, but I know Andrew and I aren't the only people who wonder if there isn't something more graceful to solve this problem?... I dunno, prolly not...


View Replies:
mlocati replied on at Permalink Reply
Hi Franz

It's nice to read such posts, thank you for your internationalization efforts.
I'm not sure if gettext approach (.po/.mo files) would be the best approach.
Maybe we could consider a database-based solution.

For instance, if we want to localize attribute names, we could create a table like this:
CREATE TABLE AttributeKeysLoc (
   aklAttributeKey INT UNSIGNED NOT NULL COMMENT 'Attribute key identifier',
   aklLocale VARCHAR(15) NOT NULL COMMENT 'Locale identifier',
   aklName VARCHAR(255) NOT NULL COMMENT 'Localized attribute name',
   PRIMARY KEY (aklAttributeKey, aklLocale),
COMMENT='Localization' COLLATE='utf8_general_ci' ENGINE=InnoDB;

When selecting the attribute row from the db, we could use a query like this:
      IFNULL(aklName, akName) as akName
         (AttributeKeys.akID = AttributeKeysLoc.aklAttributeKey)
         (? = AttributeKeysLoc.aklLocale)
      akID = ?
', array(Localization::activeLocale(), 12));

The akName item of the $row array would contain the localized name of the attribute if it's present in the AttributeKeysLoc table, the English name otherwise.

In the dashboard, when administrators have to manage the attributes, we could use an approach like this:
1. if Localization::getAvailableInterfaceLanguages() returns an empty array, everything remains as it's now.
2. when creating a new attribute (or when editing an attribute that doesn't have any translations), concrete5 asks for the attribute name (as it's now), but we could add a "localize name" link that, if clicked, shows one row for each locale. The "neutral" name (the one in en_US) should be the only required.
3. when editing an attribute that already has translations we ask immediately the names for every site locale.

I think that this solution:
- would keep the actual simplicity and immediacy
- won't break existing code
- won't need much code review
- won't have a great impact on performance (if table indexes are set wisely)
- won't need a system that somehow generates .po files, compiles .mo files and adds those .mo files to the ones handled by the Localization object underlying under the t() function.

PS: I'm thinking about writing an how-to to explain how to handle localization correctly, describing the t(), t2() and tc() functions, as well as the use of the Date helper instead of the php date() function. Whet do you think about that?

Thank you again
Remo replied on at Permalink Reply
No, using the t-function and thus a MO/PO file would be a last resort imho. If there's a MO/PO file for that, it would at least have to be separated from all other translation files, but I'm also worried that things might clash. The word "set" has a few hundred definitions in English, the word "to" can be translated to other languages in a lot of different meanings and so on. We could use the tc() function to keep things separated, but if there's a different way we should keep our focus there.

1) Performance: Well, things might get a tiny bit slower. I'd probably use some kind of conditional to assure we don't run any database queries unless it's really necessary. Judging by this pull request Andrew isn't going to like that though.
I see two different approaches: Load all user translations at once, even if they aren't used or load them one by one as needed. Both ways could have benefits, I guess it depends on the situation.

2) I agree, switching to a different screen is painful but it's something I often did with other tools. I could live with that but if there's a better way, I'd definitely like that. We've got a simple helper to print an input field or text area which, depending on the internationalization add-on, prints a field per language and then serializes everything before saving the data. It's a bit hacky but such a helper would be useful to make it easier for developers to add translatable content.

3) There are different approaches but none I can imagine that would make a lot of sense.

A few more things:

--- AREA NAMES ---

Something in addition to this discussion: Has anyone ever wanted to translate area names? I know that this wouldn't be easy to change but whenever when you install a theme from the marketplace you'll see some words that aren't translated. I know, main, sidebar etc. aren't difficult words. It just leaves a "not-so-perfect" impression behind, one that happens right at the beginning when you start working with concrete5. Not the biggest issue but if I'd build concrete5 from scratch, I'd definitely put that on a list of things to consider.


In case we're going for a database based solution: Please keep in mind that we'd also need a way to install such translations. If I build an add-on which adds an attribute, I'd definitely want to be able to put that translation in the add-on as well, otherwise we'd still end up with only partially translated add-ons.

Writing this makes me wonder about a benefit of the MO/PO solution: Assuming we'd still use t() functions [or maybe td() or whatever letter you like] we could scan an add-on and let someone translate it.

With a database based solution: How would I translate an add-on written by someone else?


one important part was merged a few days ago. Since we wrap block names & descriptions in the t() function, we've got this:
bmatzner replied on at Permalink Reply
Dear Franz,

thanks for the motion. Thanks to you, Andrew, Michele and Remo (and all the others) who have been investing their time and brains in resolving the localization issues.

I see a few different approaches, some more tedious, some more simple.

All I ever needed with localized projects was that editors had a user interface in their language. Having a few spots that weren't localized for admins was okay. For instance, it's fine if attributes are something that the admin adds in one language such as English, as long as they're displayed in the editor's language. It is perfectly fine to have to create a po/mo file for that, after all, this really only applies to multi-language projects, and attributes aren't created and edited every day, are they? Perfectly sufficient solution to me. Anything beyond that would be fine as a paid add-on.

I found that t()-wrapping the following methods/variables in various view-level scripts does the job for me:

Not all occurences must be wrapped (for instance, $arHandle for area names as brought up by Remo needs to be wrapped only in concrete/elements/block_area_footer.php). But this list appears to resolve untranslated strings in a localized environment. Job done. I don't see how adding those few dozen t functions might affect performance unproportionally.

The issue of providing translations for new, custom attributes in a multi-lingual environment where different editors have different languages can be mitigated by adding new po/mo files. To me, this seems like a rare case which should be fine, even if it is a bit more difficult to do for an admin, but still something that is possible to explain in half a page of documentation.

Adding some database-backed translation solution as Michele is suggesting seems to belong in the "tedious" section below as it would require to provide a translation interface within the c5 backend, which requires a lot more new code in order to save admins from having to provide po/mo files for multi-lingual environments. It appears to that this would only be necessary if you have a setup in which different users have different locales apart from English, for instance Spanish and Italian. How many c5 setups have that? I would assume most localized setups have exactly ONE language. For all others, such a solution that provides translatable dynamic content (such as attributes) would be necessary, but that would feel like a bonus to me.


Separating model and view:
In a previous pull request I submitted, and which was not accepted, some concerns were raised regarding the use of the t() function in models rather than view, which makes sense to me. The more it astounds me that
was now merged, in which some translation is actually happening in the model rather than the view layer.
A meaningful separation would include a decision on where translations are handled. Either in models which return translated values OR in views, which translate original strings if necessary. Multiple attempts to translate the same string will occur if this is done on different levels. It would make sense to me if the t() function is used in view files only, i.e. where parameters are echoed.

As the main concern revolving around localization was performance impacts, I would suggest to move any kind of translation to the view. The t() function then could be handled in a way that it acts as a no-op if only English is used. Michele's code adds tests regarding checking what locale is used in more than one spot. That would seem like a task that the t() function should handle. If that's the case, some caching mechanism to load the translated resources in one go should be easy to implement, and only the view layer would need to read from this cache.

A significant issue in providing good translation is that the source strings should have some context. Example: the English term "set" can refer to either the verb as in "set a value" or the noun as in "file set" or the noun as in "group set". When translating those strings, making a contextual distinction between the two different nouns in this example is impossible, but can be necessary if a language doesn't provide for an equally universal term such as "set". This is a real-life example which our German translation group ended with with something less than optimal by choosing the word "album", which means the same as in English and makes total sense when you use it for files, but it's just not it for all cases where "set" is being used.
The solution: Using keys instead of the original English strings, such as "file.set"
- Clarity
- Allow for distinction between terms based on their context
- Fixing a typo in the source language doesn't require fixing all translations
- Shorter terms in the source code, especially long, potentially multi-line sentences in the source code can be avoided
- Refactoring the entire code base
- Looking up what each key
- possibly heavy duplication of translated strings
- Presumably less performant

That's my 2 (euro) cents for now,
jshannon replied on at Permalink Reply
I'm not very clear on what the requirements are here. Lots of thoughts and potential solutions, but what's the goal? What needs to get translated? Under what context? What speed? After deciding these things, there are enough smart programmers around that the solution will follow.

What needs to be translated?
"Static" text? package info? attribute names? Part of me (the american) wants to say "meh... the fact that content is internationalizable for the end user is enough...", but the other part ("do it the right way") wants to say "make everything translatable". Remo has a good point about something as inconspicuous as area names having a valid need for translation.

But what about content? URLs?

I'm being somewhat purposefully daft here. Obviously content and URLs should be i18n'able in a different way than the other stuff, as it is today. But where does that end? Single pages use t() today. A page type might have content that is t()'ed. From a editor perspective, there's not much difference.

Which reminds me... should single pages be considered? I don't know how these work in an international environment (specifically with multiple language "sections").

What about images? Thankfully with decent css these days, this is less of a concern, but it may still be a valid requirement.

Context. This is my word for what controls the translated-to language, and when/where it can be set. Right now, for example, package info is translated under the context of the site language at the time of install then stored in the db as that language. Attributes created by the admin will be input in a given language, and aren't translatable after that. I noticed in a recent project that having email be translatable is a bit of a mess because emails aren't always sent out under the "context" of the user they need to be sent to (so if the user is DE but the site is EN, even a t()'ed email will be in EN).

I can't really speak to the requirement here. Personally, I think that t()'ing on creation (or input in the local language) should be sufficient. But I guess there are two edge cases here:

1. You install (and thus t()) before you've created your .po file. After the installation, it's too late.
2. Your admin users are multi-lingual. This seems unlikely, but might be a valid requirement.

So what context is required? Will it differ for dashboard vs front-end? Based on site language? Language "section"? Or user-specific language?

Speed. Sure, it should be fast. I guess we can quantify this, too, as one of the requirements....

Any other requirements?
Remo replied on at Permalink Reply
Well, to me it's important that if a German speaking person installs concrete5, sees a completely localized software. I was contacted by a few companies that started to use concrete5 with a question like this: Did they just start to localize the software?

Unfortunately I couldn't confirm that and basically had to tell them that we're still working on translating everything. A lot of my customers understand English, I could probably let them work with English attributes names, it just leaves a bad impression behind. Now that block names and descriptions will be translatable, we already have one big part of that. Attribute names don't show up as quickly as block names.

* Being able to change the language at any time (that 1. in James' comment) which doesn't work because some strings are saved in the database.
* Multi-lingual admins (2.) isn't that important but happened to me a few times. It was more like a community though, not so much a bunch of admins.
* Performance
* Ability to translate attribute names and see those translations when you edit page properties.
* Ability to translate attributes values. Especially for select attributes.

* Ability to translate area names

* Ability to translate group names
* A place to edit these strings


The more I keep thinking / writing about this makes me feel that even a simple t() function solution is enough. Projects where you need a real multi lingual interface are rare, but certainly exist. If I'd have to install an add-on and switch between a screen for those projects, I'd be fine with that. Sometimes I don't mind to click a few more times, as long as it's possible.

Or maybe just an event we can hook into. Here's some pseudo code for attributes:

class Attribute {
   $name = $this->name;
   $ret = Events::fire('on_data_translation', 'AttributeName', $name);
   if ($ret) {
      $name = $ret;
   return $name:

The first parameter would be a unique identifier (AttributeDescription, GroupName you name it). We could use this in combination with a t() function add-on or even build a simple database interface.

* Easily done
* Doesn't affect existing projects
* Works pretty fast (events are cheap)
* Flexibility

* Different screen to translate those strings.
mlocati replied on at Permalink Reply
I admit my first post was somehow too much pragmatic. That was due to the fact that a few months ago I had to localize the attribute names, and I did that in the way I explained above (I've just put that code here: ).

From a wider point of view, I think that we shouldn't exclude multi-lingual administrators a priori: I agree that's quite uncommon, but in the effort to have a great product this is an important feature.
Assuming that, I believe that in the database we should store only English texts (in this direction I suggested ).

I've done a fresh c5 install with the current master branch and Italian localization: all tests have been saved in English. That's good.

Apart the static strings translated by t(), for which I don't see particolar problems at the moment (except what I signaled here: ), I agree with Remo that concrete5 should allow translating the followings:
- file/page/user attribute names (and select attribute values)
- group names
- area names
I'd add also:
- permission key names
- attribute set names
- attribute type names

All those may be translated by t(), but that would lead to a somehow strange behavior. Let's say, for instance, that we want to edit the 'Width' file attribute.
In the attributes list we'd see the translated text for Width (in Italian that would be 'Larghezza').
When clicking the attribute to edit it, which text should I see in the "edit name" text box? For consistency, I think we should show the English name (since we'll save the English text in the DB). And that would be strange: I wanna edit 'Larghezza' but I see 'Width'. This is the reason why I implemented the database-based solution: I show both the English and the localized names.
Furthermore, with the t() approach, we would have to generate/update a .po file with all the strings, add an interface to allow translations, compiling the .po into a .mo file, and add this .mo file into the .mo file chain accessed by the t() function. Quite heavy...
Remo replied on at Permalink Reply
Michele, what do you think about the event based approach? With this, we could use a t() based approach as well as a database based approach.
mlocati replied on at Permalink Reply
Well, I think it's a good solution; the events approach will simplify the process by centralizing the problem in one place, the handlers of the event.
The event handler should do the following:
1. Look into a database table to see if there's a translation
2. If the translation is not found in the database, the handler does a call to t() (or better, tc())
I'd to this process with this two-pass because the names are being translated by the .mo file, but users will be able to change the localization.
I'd use the tc() instead of t() to have better translations (think for instance at translating 'set': tc('AttributeName', 'set') is surely better that t('set').

I'm thinking about the usefulness of events: does package developers will need to handle those events? I think not: the translation of package-defined names would be done automatically by using a .mo provided with the package. So, why don't use a static function? I mean, instead of
class Attribute {
   $name = $this->name;
   $ret = Events::fire('on_data_translation', 'AttributeName', $name);
   if ($ret) {
      $name = $ret;
   return $name:

we could simply use something like
class Attribute {
   return Localization::DataTranslate('AttributeName', $this->name);

The only problem I see here is the add/edit of those names.

About editing: let's say we want to edit a 'Larghezza' attribute (translation of 'Width'): in the edit attribute screen, which name will we view?
- If it's the English one: that would not be very clear and a bit strange.
- If it's the localized name: to edit the other translations we'd have to switch the UI locale to edit other localizations (and sincerely, if I have to switch to Chinese just to edit an attribute name I'll have some problems;)). Sure, that wouldn't be very common, so it's a minor problem.

About adding: when creating a new attribute, users should insert the en_US name, otherwise all the process wouldn't work as expected. And this too would sound a bit strange.

So, I'd add the possibility to specify all the translations in the add/edit UI.
Remo replied on at Permalink Reply
Not sure if I understand you correct, but are you suggesting to check the database no matter what?

My idea with events was mostly because it's cheap. If you need database based translations, put that functionality in an add-on if you're okay with t()'s, install another add-on and you'll have a solution that performs a bit better.

The sames goes for tc(), if you need that, go for it but it's part of the add-on, all the core does is firing an event.

There are like three add-ons that could benefit from that event
* Database based translation
* t() function based translation
* tc() function based translation

I might have missed one case but I don't think there will be many cases where you need that event.


About the other problem: I guess if someone uses Chinese as the main language you'll have a lot more problems. If someone creates a sitemap in Chinese I'll have a hard time to translate it to German. That's an issue not just related to attribute, group names..
mlocati replied on at Permalink Reply
Well, perhaps I didn't explain myself properly.
In particular, I'm referring the fact that concrete5 comes with some names that currently are not being translated (attributes names, attribute set names, attribute type name, user group names and descriptions, permission names, ...).

If I understand you well, you'd just fire the event and not implement a handler for it. In this case, all the above names won't be translated on a fresh install of concrete5.
Did I understood you correctly?
Remo replied on at Permalink Reply
Yes but they could be translated by installing a certain add-on.
bmatzner replied on at Permalink Reply

thanks for alerting about the tc function, Michele. That really seems to resolve the issue regarding "set" I gave above.

Apart from that, I'm having a hard time seeing the problem, so I'm trying to think out load here:

Isn't the t() function the tool to gather a translation string from whatever translation resources exist based on some input string? Since t() could also take an optional second parameter to behave like tc, I'll refer to the generic "translate this" function as t()

If so, wouldn't everything, that may need to be translated be wrapped with that function?

The function then would need to gather the translation resources, and if an additional database containing extra dynamic strings exists, proxy to access all of them, possibly cache them in one resource and read them thereafter?

Since Zend_Translate is agnostic to the different translation sources and we can't expect gettext to be available as a system requirement, it would seem like a good idea to alter the t function to be able of handling different kinds of translation sources

For dynamic content that can be named in the backend, an additional form to add language versions would be necessary.

Given that the source strings continue to be English and not some keys that allow more precise translations, then t() could act as a no-op.

An event-driven architecture would be nice to use, which may even allow for getting rid of the t() functions everywhere.

mlocati replied on at Permalink Reply
@Remo: I'm not sure that leaving strings untranslated by default is be a great choice.
That would require to explain people how to enable translations. Furthermore, most of people that wants to just try out concrete5 won't take much time in reading how-tos, readme files and so on. They just download and install. And see some untranslated text. That would lead to feeling that concrete5 is not very mature (as you mentioned elsewhere)
Remo replied on at Permalink Reply
I agree but I'm not sure it's that much of a problem. Once you're at the point where you need translatable attributes, you're unlikely to be a beginner anymore.

I think it's important that block names are translated because that's something you'll see rather quickly.

I'm not really sure what the best approach is but events have the benefit of not hurting the core.

Let's see what Franz thinks...
mlocati replied on at Permalink Reply
@Bernd: well, as I mentioned in my first post, I'm thinking about writing an how-to about the t() functions family...

About improving/updating the t() function: I'm afraid that it would lead to performance problems. For that reason I mentioned above to create a new function (I called it Localization::DataTranslate).
For the same reason, I wouldn't replace t() functions with events: even if events are cheap, they are however slower than a direct function call.
Remo replied on at Permalink Reply
Just in case that wasn't clear: I never thought to replace t() with events. I just wanted to use an event for Localization::DataTranslate as it offers some flexibility without touching a lot of core functionality
mlocati replied on at Permalink Reply
Translatable attributes would be nice even when users take a first glance to concrete5. At the moment you see untranslated texts in these places:
- when setting page information, like metadata and attributes like "Exclude from nav"
- in the file manager (width/height of images)
- setting page permissions (but since they are read-only, a simple t() call would be enough)
- in user details ("I would like to receive private messages", ...)
Remo replied on at Permalink Reply
yeah you're right..
mlocati replied on at Permalink Reply
About permissions translations: I just created a pull request. See (which is very similar to )
bmatzner replied on at Permalink Reply

> About improving/updating the t() function: I'm afraid that it would lead to performance problems.

That is the type of argument that just kills discussions. Why would improving a function lead to performance problems? It's in the nature of improvement that it does not create new problems, otherwise it's not an improvement. You're saying there is nothing that can be done to optimize the t function without affecting performance. I suggested to use a cache that gathers the various translation resources, even those that use a database to store translations, allowing for dynamic attributes to be translated (which was your idea). If the t function reads from one translation source no matter where those translations come from (core mo file, packages mo files, custom mo files, database) or how they're managed, and that returns the input string on line 1 if the current locale is en_US. Wouldn't that be a performant solution?

Hypothetical performance problems seem to be the main reason why we are having this discussion. We're also discussing this for hypothetical scenarios in which c5 is used by editors with different languages and in which adding a .mo file to the languages folder for additional attributes is not an option, because the attributes change so often.

- Is there one use case from the reader's projects that can prove such scenario?
- What specific performance issues do you expect?

James Shannon has been arguing that wrapping the remaining spots where untranslated strings are echoed is adding "bloat".

I just did a quick count how many times the t() function is used in the concrete folder: 3469 occurences.

Now I'm comparing this with my patched folder, in which pretty much all all strings are translated: 3589 occurences.

- That's 3%. Is that bloat?
- Does that cause new/more/significant problems?

I'm suggesting to separate levels of concern: In the view layer, exactly one function should be used to signify that a string needs to be translated. The function should optionally be able to specify the context ("Translate the term 'set' as in 'file set'"). It should allow for pluralization to take languages into account (1, 2, many).

That's all it should do, and it should only be used in the view layer. Models should not worry about view related issues, such as how a key should be displayed, the key being and English string "Edit content block", the desired display being the string translated into German: "Inhaltsblock bearbeiten" (notice that "bearbeiten" in German means edit, so the order is reversed).

I guess what I'm saying is this: I'm totally in favor of a clean solution, one that helps separate concerns, allows for implementing tests, focusses on performance, and helps solve real-world problems.

For now, I don't see why it's not possible to wrap those remaining 120 occurences and be done with many months of waiting for a solution that's available and then refactor to remove the t() calls in controllers and models, to implement a UI for providing translations for attributes, group names etc. and possibly to implement whatever kind of methods that implement the most appropriate design pattern.

mlocati replied on at Permalink Reply
Yes, you're right. My sentence about improving t() was somehow inappropriate. My apologies.
In fact, I took a deeper look into Zend, and yes, integrating another source in parallel to gettext is possible, and wouldn't require much efforts.
In particular, when setting the new locale, currently concrete5+Zend load the .mo file, decode it, and cache the decoded data.
Integrating data from DB would be as easy as adding a new translation resource (in our case via the array adapter with data from DB) before caching. Easy, fast to implement and without relevant performance issues.
Then concrete5 should reset that cache every time the db data changes, but that shouldn't happen very often (since that would happen when attributes are being added/modified/deleted).
The only thing that requires particular attention is the order/precedence of the translations: those from DB should take precedence.

About the location of t(): I agree that they should all go to the view layer. My two pull requests and go in the wrong direction.
We should remove the t() from models and add them to the views. My only worry is the time such a pull request would take before a review from the core team... Let's wait for a word from Franz / Andrew...

And about the unification of t(), t2() and tc(). I see some problems about that: all takes a variable number of arguments. For instance, t() accepts as a first parameter the string to be translated. The other optional arguments would be used for the sprintf function inside t().
I think this has been done to simplify things and make them more readable: instead of sprintf(t('Hello %s'), 'John') we have t('Hello %s', 'John').
If we'd change t() to make it accept an optional context, I'm afraid we'll need to explicit the sprintf.
The only unification I can see is t2() and t(). Instead of t2('%d page', '%d pages', $num, $num) we could adopt a solution like t(array('%d page', '%d pages'), $num, $num) or even better t(array('%d page', '%d pages', $num), $num), but I think that's quite more unclear than t2().
bmatzner replied on at Permalink Reply
Hi Michele,

thanks, no prob.

As for the unification of t, t2, tc - I was just thinking about why three functions exist to do the same thing, I didn't mean to suggest that refactoring this is a solution for the problem we have.

jshannon replied on at Permalink Reply
Just for the record, I haven't been arguing that wrapping everything in t() "adds bloat". (If anything, it might be quicker than the db solution that I have in mind.)

My problem with it is that it seems inelegant and can be confusing, not really fitting with the c5 paradigm.

Translation Today:
Right now t()s are (mostly) done on static strings. You can run the po-search (or whatever it's called) on your c5 source and it'll use some sort of magic to compile everything that has to be translated for you. You get a list, and you enter the translated strings.

If you're using a common language, you shouldn't even have to do this; you don't touch the po-editor... you just install the file from some online source.

If you're using c5 in one language, you're done. All the attributes, pages, etc that you create will be in that language. (There might be some problems with area names, etc, that can't be fixed.)

If you're using c5 in multiple languages, I think that's pretty easy/logical too (as far as it works). You have different "sections" and create content in the normal c5 way separately for each one.


Wrapping everything in t()s

Firstly, you've now got the day-to-day-admin (not necessarily the smart, developer-style admin) who's got to deal with translations.

Whereas you translate c5 pages by creating a new page in a different section, let's say that page has an attribute on it that needs to be translated. That's some sort of spreadsheet-like po-editor.

Now the workflow is way different. Before you could install the .po and be done. Now there's a back-and-forth as you build the site, make edits, change attributes, etc.

And since they're dynamic strings, po-search won't work -- it'll ask you to translate "$at->getDisplayName()" or something. (I've seen some solutions where it'll "log" un-found strings at runtime so that you can run add them later.)

So now the site admin has to create something, copy the name (exactly!), then go to the spreadsheet-like interface and paste it in along with the translations. She should remember that the new name (even if it's in, e.g., Swedish) should be pasted in to the "English" column (the one with the 3469 English phrases already).

I just don't think this is a great solution. It's acceptable as a stopgap, or if c5 refuses to do anything big, but this isn't something I want to be using in 3+ years.

* Also, quoting of numbers can be misleading. I'd be curious how many times t() is run on a single page view. A lot of those 3469 probably run rarely (or even once per lifetime, like in the block/job/package installations). Whereas most of the stuff that's being discussed will be run more frequently -- maybe multiple times per page (attribute lists).

I agree that performance considerations shouldn't affect requirements. It's the job of smart programmers to take the business requirements and make them work, in a performant way. But performance should affect the eventual implementation details. That's why I sugested business requirements first.
JohntheFish replied on at Permalink Reply
Relating to finding strings etc, or po-search as @jshannon calls it, with dynamic strings.

I don't know enough about translation to back any particular solution above, so what I am describing below is just an algorithm that may be of use for the t() based solution. It does not signify any personal backing.

- Have a learning mode switch where t() records any strings it does not already have for a particular language.

- Wander the relevant pages, or let others wander for you.

- Have a dashboard page that pre-populates a translation form/list for a particular language with t() not found for that language.

- Enter the translations against each & save.

- Once stable, switch off the learning mode.

If the overhead of recording is low/negligible, then maybe the learning mode switch is not relevant.
mlocati replied on at Permalink Reply
Well, I think I've quite a deep knowledge to describe what's happens by calling Localization::setLocale and the t() function family.

On startup setLocale (with the help of Zend) loads the full .mo file (compiled version of .po), and creates a memory table of source=>translation texts (think of it as an associative array). Once that that associative array is built, it is cached so that future calls doesn't need to parse the .mo file again.

The t() functions basically do a simple lookup in that associative array.

What I'm proposing is to add attribute names (and the other dynamic strings) to that associative array. Instead of storing them in a .po/.mo file, we could store them in a database table (let's call it DynamicTranslations), and add the table data to the associative array before it's cached.

Let me make an example to be more clear.

concrete5 comes with the 'Width' file attribute. On a localized installation, the translation of 'Width' would be in the core .mo file, so it can be seen translated simply by a call to t().
When users add new attributes, we should implement an appropriate UI to allow specifying the English and the translated name of the attributes. The new localized names (not the en_US one) should then be stored in the DynamicTranslations table, and the associative array should be re-created by deleting the translation cache.
The same DynamicTranslations should be updated also when (for some obscure reason) users rename the existing attributes (eg altering the translations of the 'With' attribute).

That said, all seems quite clear to me, except how to implement the interface to insert add/edit attributes...
bmatzner replied on at Permalink Reply
Hi James,

could you please clarify your proposal, I have a hard time understanding what you're suggesting, i.e. what is the "this" in "this isn't something I want to be using in 3+ years", and what is the "everything" in "Wrapping everything in t()s"?

jshannon replied on at Permalink Reply
I don't have much of a proposal, actually. I just don't like t()s. :)

This = the proccess I described above, in which you have a back-and-forth creating / translating process, which involves a lot of copy & paste (or my "run-time logging" / John's "learning mode"). I also don't envision the eventual solution as being particular c5-y. (Though I guess the po-editor could be made into a popup sort of thing...)

Everything = strings stored in the database as package / block / attribute / etc names, area names, etc.


Basically, I feel that anything stored in the database should be translated in the database. Like a page or a URL, which exist as different rows with a shared key, but a different "language" key. That way non-generic admin interfaces can be built to "create translated attribute names" or "create translated select attribute options". Eek... sounds complex, but "feels" a lot cleaner than the po solution.
frz replied on at Permalink Reply
So this issue is going to be the heart of our new Weekly Community Advocacy Check in meeting. (I know! Shocking level of process for us, right?)

Andrew and Ryan will likely have more detailed technical thoughts than I, but here's what I'm getting out of this conversation so far:

1) Performance hit from t() isn't a huge concern for a variety of believable reasons.

2) There's really no 3rd choice. Either you wrap everything in t()'s or you build multi-lingual support into your database tables. Given that choice, I'd push for t(). It seems odd to add all the cruft of multiple tables to concrete5 for just these interface elements, we're still using t() elsewhere and we still push for multiple trees for different languages. If we had architected concrete5 to be 1-n on any text stored in the database it'd make more sense to keep exploring this direction. Since we didn't, I'm afraid we'd constantly be catching up and missing some little tidbit that now needs its own table and join. (it's not just adding a column for 1 extra language, but 1-n you have to worry about)

3) There's a new concern (to me) of context around translated strings. That makes sense now that I'm aware of it, but it's certainly going to be difficult for us to do correctly. It's hard for us brazen english speakers to really wrap our minds around how subjective our language is. Even though I'm intimate with certain idiosyncrasies like "free as in freedom or free as in beer" its unlikely we will be abe to predict which words need context added across the board. We will need some type of process in Git so someone wiser than us can verify pull requests that get this stuff right.

Am I missing anything?
Am I not accurately understanding the consensus?
jshannon replied on at Permalink Reply
Regarding the context "problem", I wonder if there's any usefulness to having an "English -> English translation".

Right now t("free") returns free in the current language if a translated string exists, or free if there is no translation, or free if the language is english.

For long strings "free as in freedom or free as in beer", there's enough context that a translation in another language should be able to get it right. There's not a lot of ambiguity in that phrase. If it gets translated once, and then used somewhere else a year later, the original translation should still "work".

However, if you just do t("free"), and that gets translated, it's possible that another t("free") somewhere else on the site will have the incorrect translation.

One solution to this is that ALL strings are translated. In other projects I've done stuff like t("ECOMMERCE_BUTTON_PRICE_FREE") => "free" (english) in one place and t("NAVIGATION_LINK_SPEECH_TYPE_FREE") => "free" (english) in another. This makes initial programming difficult, updates difficult, and requires a lot of unnecessary translation records (you end up with 20 keys that translate to "free" when you might only need 3).

So my suggestion (if we are going with t()), is a hybrid solution between today, and the "all strings translated" method. So a naive american will put t("free") in places and that will automatically _ALWAYS_ go through the translation engine to look for an English translation. 90% of the time one won't exist, so we get "free". However, a translate comes along and translates it in one place, and realizes it exists somewhere else with a different meaning.

So the solution is to change the original source locations to something less ambiguous, like t("free (ecom button)") or t("free as in beer"). The translator then adds their own appropriate language translation, PLUS the english "translation" ("free").
bmatzner replied on at Permalink Reply
Hi Franz,

thanks. That sums it up for me.

I'd appreciate if you took the "translate string in the view layer only" paradigm into account.

Github process: I understand the core team favors atomic pull requests. Since most of the remaining missing backend translations fall into groups (attributes, group set names etc.) would that be a small enough entity to create a pull request for?

Verification: I happen to work on multi-language and translation projects frequently (English, German, French, Russian), so I'd be happy to provide comments on such commits from various language perspectives. Perhaps we can find someone who's familiar with Arabic, Chinese, Japanese, to give more feedback. The other active ones will probably be happy to supervise that, too.

Finally: it would be great if you could set a timeframe or designated release for which we could run a sprint/bughunt related to localization. Have one or two days of joint hacking, IRC discussion, closing pull requests, and we might be done with the whole topic in no time.

mlocati replied on at Permalink Reply
I guess the main problem is not t() vs database. From all the above (long) discussion, it could be possible to improve t() by adding to its functionality the ability to access also translations from database.
That's necessary to take in account translations that are not fixed (like the attribute names), but could be updated via a still undefined user interface.

And about translation contexts: surely developer doesn't have to know every language in the world. So, they shouldn't care too much about translations. The highlight of problems should be delegated to translators, who (should) know what they're doing. What's important is that developers listen to translators feedback (like the one from Patrick Heck: ).
We can't pretend that translators know hot to use github, but they could use the standard concrete5 bug reporting (since from my point of view an ambiguous text that can't be translated is a bug).
tsilbermann replied on at Permalink Reply
Hi guys,

just to show the view from the user perspective.
I often tell my new clients: "the translation is not complete yet, but it's a great CMS". It's easy to handle etc. This does not apply to translations. There are so many places where the non translated words are visible (attributes, composer, blocks etc.). The same for people who want to try concrete5. It gives people the feeling c5 is not very mature (as Michele and Remo mentioned).
The beginners mostly install concrete5 with sample content. That means there are already a few attributes - but untranslated (width/height in file properties, Thumbnail image in composer etc.). Why on the one hand make it easy for people to install demo content, have a nice GUI and easy to use system and on the other hand confusing them with untranslated things (who are always there as the content, image navigation blocks etc.)?

About the other stuff (technical) I can't contribute something. But I hope this discussion will go on inspired and we have a nice working solution in the near future :-)

Best regards

andrew replied on at Permalink Reply
Lots of good, well-thought-out points in this thread. Here are my own thoughts. Please let me know if I'm missing something. Here's where I think I see us heading:

1. I don't think we want to provide interfaces in the Dashboard for all the localization. (e.g. storing the actual strings in separate tables, presenting multiple form fields for each language, etc...) Not only would this be a lot of work, it'd also raise lots of questions around the fact that we don't do it everywhere. If we're going to add multiple multilingual outputs for attributes and groups, why not on page names? etc...

2. I think wrapping the t() strings around _some_ dynamic content is probably what we're going to have to do, and while I was the first to say it didn't feel right earlier, I think it's the best combination of getting a solution out there, minimizing bugs and minimizing the amount of work required. I think bmatzner's list is an excellent start.

3. This is all fairly useless without some way to access these strings for a multilingual site. To that end, we are going to explore integrating the free Internationalization add-on into the core. Even if we don't actually integrate the add-on, we will integrate the portion of the Enterprise Internationalization add-on that takes care of parsing code files, and providing an interface for translating/generating .po and .mo files.

4. Along with #3, we're going to provide some way to get dynamic content into these files, the same way we parse static PHP content. This would be available for third party packages too. (e.g. run a job, and all your attributes and groups other dynamically translated items get added into your various site po files.)

5. As far as using t() around dynamic content, we will only use this t() string in views. e.g. in elements, in the single pages themselves, etc... Exceptions to this include block types, which don't have any ways of being editable. Any content that is editable through the dashboard can't have it's t() string at the class level. It has to be in the view.

6. We would love to make better use of the tc() function but we're not really that sure about where the paint points lie. I know that mlocati has submitted a number of approved pull requests that use this method, and we're accepting them. We're going to need a member of the international community to really let us know when a particular comment is useful or not. This isn't something we're really equipped to do on our own – but we'd be happy to accept pull requests for it. Remo – would this be something you'd be interested in?

Anyway, just thought I'd continue the conversation and let you all know how we're planning to proceed in the core with this, and with the pull requests that make use of these methods. Thoughts?
Remo replied on at Permalink Reply
Long story short: I'll try to help where I can, but even with the experience I have, I never found a solution to these problems without a downside.


About 6 in more details: I am interested in getting internationalization to a more advanced level, but I have to admit that I often struggled with context related translations before.

The reason for this is mostly because it's not always clear where a context is needed. Words like "to" or "no" are pretty annoying to translate but I only know this because I a speak German and English and have a very basic knowledge of a few more languages. I happened to be a part of a bigger team where we've rolled out our application in China, Korea and Japan and guess what, I don't speak any of their languages and contexts had to be changed due to feedback from locals.

We ended up doing what James described above. t('SHIPPING_METHOD_TO') instead of t('to'). That's imho rather ugly but it works and when there's a deadline, you sometimes have to find a solution, even if it's not perfect.

Also keep in mind that a fix context like "AttributeName" or "AttributeValue" might not work well. We often have a value like "No" in a select attribute. This value could either be translated to "Nein" (No) or "Nr" (short for "Nummer" = "Number")
bmatzner replied on at Permalink Reply
So, now that some sort of conclusion is in place...

1. Who will implement those changes? Will the core team, or will you accept pull requests?
2. If PRs will be accepted, are there general rules as to what those PRs should look like (in terms of number of commits, scope of changes etc.) so that they can be accepted?
3. When is a time when you would be able to review and merge the PRs, so there won't be a large list of long-running PRs?
4. In a previous post, I suggested doing some sort of bug-hunt, during which we can fix, test and discuss, and most importantly, close issues, all within a few hours, possibly. No one responded to that, is that a bad idea? Since the core team are the ones to accept changes, we'll need your participation...
frz replied on at Permalink Reply
1) Yes. Pull requests will be reviewed and accepted on this stuff much sooner than we will do anything on it.

2) Same deal as always on PR's, simpler is always easier to accept - although I recognize there's some deeper stuf fhere.

3) Beginning of next week we will sit down and go through each and every existing PR for internationalization and either accept it or comment on it. There's quite a few out there.

4) I like the idea of a swarm, but I think its less about being awake at the same time (since we're in different time zones) and more about having a good list of things to work from. Perhaps a trello document for this effort would be helpful?
mlocati replied on at Permalink Reply 1 Attachment
Just for the records, I just submitted the how-to you can find attached to this post. It describes the t() functions family and the DateHelper.
mlocati replied on at Permalink Reply 1 Attachment
"Just for the records v2": I submitted another how-to about the generation of .mo files.