Search Engine Optimization

Permalink 1 user found helpful
Another small issue with "Insert Link to page" in content block edit.
Doing some SEO I've discovered that google indexed all my pages with and without a trailing "/" and I believe that my pagerank got down for that (duplicated content on every page).
Investingating I've discovered that this comes from autonav vs. content block:
- the autonav insert links likewww.www.yoursite.com/yoursection/...
- the content block inserts links likewww.www.yoursite.com/yoursection...

They differ only by that small "/" traling slash. But google treat the two links as two different pages!
I think that it should be adopted the same url syntax in both blocks so that search engine indexing can be better.

This problem affects all concrete5 installations!

What do you think about? Can it be done?

energywave
 
Tony replied on at Permalink Reply
Tony
you could probably add something to your header so if the url includes index.php?cID=
then it will print:
<meta name="robots" content="noindex, nofollow" />
energywave replied on at Permalink Reply
energywave
Thank you Tony, but I'm not talking about links reported as /index.php?c=xx here (I'm exposing that problem in my other post athttp://www.concrete5.org/community/forums/usage/insert_link_bug_in_... [UPDATE: that thread is closed: it wasn't a bug!]).

Here I'm talking about links with and without a trailing slash like these:
/section/page/ and /section/page
The first with the trailing slash comes from the autonav block while the second one comes from a link that you inserted in the content block with the wysiwyg editor.
Tony replied on at Permalink Reply
Tony
wouldn't you think google would be smart enough to realize that the same address with and without a trailing slash is the same thing? ...that's almost like google thinking page/ and page/#anchor or page?somevar=1 were different pages. How do you know they're being treated as different pages? (because this doesn't seem right)
energywave replied on at Permalink Reply
energywave
I've subscribed to google webmasters and I've seen it in Diagnostics -> Content analisys section -> Pages with duplicate meta descriptions.
In the list I see for every page I've writed a meta description the version with and without the trailing slash.
If you want the proof try searching this with google:
site:www.poliware.com prodotti progetti custom
and then click on the bottom link "retry the search including the omitted results" (I've translated the italian text... not sure that's exactly so).
Now look at the second and third link in the results! They are:
/prodotti/software/progetti-custom/
and
/prodotti/software/progetti-custom

treated as two different pages!

Try subscribing to google webmasters (it's free) and see for yourself on your site!
We must absolutely change this in concrete!
Tony replied on at Permalink Reply
Tony
that's really really weird. Does anyone else have any thoughts on this?

Maybe the trailing slash should be automatically added to concrete's View::url() method? will run this by andrew.
nolmscheid replied on at Permalink Reply
nolmscheid
I have NEVER heard of that before (how Google treats it as two URL's), although I do see the inconsistency in how Concrete5 puts the following slash in the autonav, but NOT in the insert link.

Your proof does show that they are indexing them as separate as well.
nolmscheid replied on at Permalink Reply
nolmscheid
Page List Block does not put the trailing slash either. It seems as the autonav is the only thing that does.
energywave replied on at Permalink Reply
energywave
Even if I never used php, I've managed to understand that the thing happens in getURL function in controller.php in concrete/blocks/autonav (that wasn't difficult :)
I've modified it to remove the trailing slash.
I've uploaded to my site and now I have a super site that have all the links consistent!
I think this should be merged in the main trunk of concrete svn for the next release!

For people wanting to apply this:
1 - Create the autonav folder inside /blocks folder (reults: /blocks/autonav)
2 - extract the controller.php file from the attached zip in the above folder (blocks/autonav).
3 - DONE! Test it!

WARNING
IN THE FUTURE, when upgrading to a newer concrete5 version, remember to remove that file!

[ATTACHED FILE REMOVED: LOOK AT MY NEXT POST]
Remo replied on at Permalink Reply
Remo
links without a trailing slash generate a redirect. That's not a complicated thing but since it's easy to avoid, I'd think about adding a slash everywhere...
Remo replied on at Permalink Reply
Remo
this page explains what I mean a bit more detailled:http://httpd.apache.org/docs/2.0/mod/mod_dir.html...
energywave replied on at Permalink Reply
energywave
But with pretty url we are rewriting urls: is that document you linked valid even with url rewriting? Doesn't it pass all to index.php?

However I did that mod because that was the only thing I could do: the autonav was pretty simple to change for me.
I've tried to change the other links so to add traling slash in all links, but I didn't manage to find the files and functions to change! (I know nothing about php... damn, I must learn it!)
What do you think about?
Remo replied on at Permalink Reply
Remo
it is related to rewrite but not the mod_rewrite you're probably thinking about

rewriting urls is one thing but there are things a webserver rewrites on it's own.

/dir redirects to /dir/ no matter what rewrite rules you're using.


so, yes it's a redirect but not those you usually put in .htaccess
energywave replied on at Permalink Reply
energywave
Ok Remo, I've thinked about the story and, even if I'm not so sure that mod_dir act before the mod_rewrite (but who cares! :), I agree with you that all the links should have the traling slash.
If we do that we can even have relative links (that, without the slash, will not work) and it's more correct in general.
The problem is: Remo, you certainly know how to fix the urls generated by content editor and Page List adding the trailing slash, but not me! I have programming know-how in other languages but not php, it seems a world apart to me!
Can you please fix that and post it here? Hoping that will be merged in the main trunk for the future...
If not, can you please point me to the files to change? In particular the content block is really difficult to understand for me...

Thank you in advance :)
energywave replied on at Permalink Reply 1 Attachment
energywave
Here is the fix again: I've corrected a little thing that prevented to go to homepage on autonav! Now it works ok, but the link of the home page is with the trailing slash.
energywave replied on at Permalink Reply
energywave
At first glance you can think that this is not a problem at all, all that only for a little slash, but for search engine it means less trust in our pages, and for us means less good positions with our concrete5 sites, it's a real pity!
I'll study again how to add the trailing slash in content editor links and in page list block. In the second one I believe that should not be difficult, but the content editor is real difficult to understand for me!
Isn't there anyone that know well the conrete5 structure & php that wants to make the change? Or if you have different opinion please write it here, we can talk about it!
Remo replied on at Permalink Reply
Remo
I have no clue where I still should answer a question. There are just too many topics in that forums where I'm involved...

open this file: concrete/blocks/content/controller.php

the method replaceCollectionID should look like this

private function replaceCollectionID($match) {
         $cID = $match[1];
         if ($cID > 0) {
            $path = Page::getCollectionPathFromID($cID);
            if (URL_REWRITING == true) {
               $path = DIR_REL . $path. '/';
            } else {
               $path = DIR_REL . '/' . DISPATCHER_FILENAME . $path . '/';
            }
            return $path;
         }
      }
Remo replied on at Permalink Reply
Remo
it's pretty much the same with the page list

concrete/helpers/navigation.php
getLinkToCollection should look like this:

public function getLinkToCollection(&$cObj, $appendBaseURL = false, $ignoreUrlRewriting = false) {
      // basically returns a link to a collection, based on whether or we have 
      // mod_rewrite enabled, and the collection has a path
      $dispatcher = '';
      if (!defined('URL_REWRITING_ALL') || URL_REWRITING_ALL == false) {
         if ((!URL_REWRITING) || $ignoreUrlRewriting) {
            $dispatcher = '/index.php';
         }
      }
      if ($cObj->getCollectionPath() != null) {
         $link = DIR_REL . $dispatcher . $cObj->getCollectionPath() . '/';
      } else {
         $_cID = ($cObj->getCollectionPointerID() > 0) ? $cObj->getCollectionPointerOriginalID() : $cObj->getCollectionID();
         $link = DIR_REL . '/' . DISPATCHER_FILENAME . '?cID=' . $_cID;
      }


in this case, I'm not sure what the best way to fix it is. I wanted to fix getCollectionPath since this is the method that gets called all the time. But since I'm not familiar with every line of code in c5 I thought it safer to fix navigation.php only..

The actual problem is the fact, that the table "PagePaths" already contains lots of entries without a trailing slash...

This fix isn't future proof! It works but use it only if you know what you're doing
energywave replied on at Permalink Reply
energywave
Wow! Sorry for writing again soliciting the changes, but... hey, thank you very much! I know you're involved in 1000 threads ;)
This evening I'll test your changes and I'll thank you again :)
However I believe that this should be merged in the main trunk or taken into account for the next release, this is more important as it appears, don't you think so?
Remo replied on at Permalink Reply
Remo
yes, I'd like to see that kind of change in the trunk but what I did is more like a hack instead of a fix..

I'd prefer having the correct urls in PagePaths and this needs a few more changes.

I guess you/we have to wait for the core team to fix this.. I could help writing a real fix but only if I knew that it gets merged..
frz replied on at Permalink Reply
frz
stick it in a patch file and we'll totally include it. ;)

-frz
Remo replied on at Permalink Reply
Remo
it would probably also need an upgrade script because I'd have to update the data in PagePaths..

It would need an update script as well!
energywave replied on at Permalink Reply
energywave
Thank you Remo, I've put your fix on my concrete powered site and I'm now ready to begin some SEO in the meanwhile that the next version of Concrete5 will come out!
Thank you again!!!
The only strange thing is that I had to modify directly the concrete/helpers/navigation.php file, there was no way of putting the changed file in helpers/navigation.php, nothing worked. But who cares, the next version will not need it (at least I hope...)
andrew replied on at Permalink Reply
andrew
These are all good points. We're addressing them in our next version.

1. The View::url() method should add a trailing slash to all URLs. This means URLs like "/login/", internal form posts, etc... Links to files that uses the download_file/number/ syntax...

2. URLs automatically inserted into content blocks should have trailing slashes.

3. The getLinkToCollection method is modified to add a trailing slash, fixing issues with page list, other blocks that use this method.

Additionally, while working on this I noticed some mild to moderate bugs with image/link substitution in the content block, which have also been fixed.
energywave replied on at Permalink Reply
energywave
These are good news I like! ;) Thank you Andrew, that will make happy all the SEO :)
energywave replied on at Permalink Reply
energywave
Andrew, I've seen that even in sitemap.xml the urls don't have the trailing slash!
Maybe that was already fixed with the changes to View::url() ?
I hope you can get this in the next maintenance release of concrete5 :)

Thank you!
jessicadunbar replied on at Permalink Reply
jessicadunbar
Hey Guys,
I just found out the URL issue today. I was wondering if this is going to be a patch or in the next version of C5. OR is it already updated, and we need to make some adjustments on our end.
Thanks in advance.

Jess
ssnetinc replied on at Permalink Reply
ssnetinc
Greetings,
I don't know if anyone is still following this topic. It's March 2014 and this issue was obviously not fixed in the core. Google was indexing 3 versions of many of my pages.
/how-vcheck-works/30-day-guarantee/
/how-vcheck-works/30-day-guarantee
/index.php/how-vcheck-works/30-day-guarantee/


I found another post that hopefully will fix the index.php/ issue:
http://www.concrete5.org/community/forums/customizing_c5/additional...

However, as I went to the bottom of this topic - the suggested fix won't apply because my /concrete/blocks/content/controller.php doesn't contain the section mentioned.

Was there ever a fix for this? I'm using concrete version 5.6.2.1