Google crawl url with index.php

Permalink
Hello guys,

I've just realised something about all concrete5 websites.
I'm looking at one of our website analytics account and it seems that there are pages that are /index.php/ and then others that don't have that....this seems to be creating a problem with analytics. For example, there are a number of visits to /index.php/appointment-thank-you which aren't included in the goal conversions....

There should be a way to stop google from listing the index.php versions of the page paths. because at the moment, the problem is that google is crawling the index.php version and also the non index.php version....there should be a way of avoiding this.

I think that you will always be able to load both versions, but we should be able to tell google to only crawl the ones without index.php. That way no one will ever visit the index.php versions...? Could you please help me how to fix this problem? is there any where to stop google crawling the pages with index.php and also redirect the url with index.php to the correct url with out index.php ?

Thank you

Aida

 
drbiskit replied on at Permalink Reply
drbiskit
1 - First, switch on Pretty url's in the backend:
yoursite.com/index.php/dashboard/system/seo/urls/

2 - Then make sure that the correct code provided is included into your .htaccess file

3 - add this to your config/site.php file:
define('URL_REWRITING_ALL', true);

4 - Update your sitemap:
yoursite.com/index.php/dashboard/system/optimization/jobs/

5 - Check the sitemap, and make sure only the correct urls are included:
yoursite.com/sitemap.xml

6 - If correct, submit your sitemap to Google Webmaster account:
http://www.google.com/webmasters/...

That should hopefully rectify the situation.

You can also request that specific urls are removed from the site indexing using the Google Webmaster tools.

If you have done steps 1-4, and you are still getting 'index.php' in your urls, something else isn't right - but let's see how you get on with the above first.
northsid replied on at Permalink Reply
Hello and thanks for your help.

The thing is that I have checked thewebsite/sitemap.xml and there is no index.php in any of the urls so I was thinking maybe concrete5 creates an auto index file with index.php in it?! can this be the case?

Thanks :)
drbiskit replied on at Permalink Reply
drbiskit
Ah okay... No, not as far as I know. Are you 100% sure that none of the actual links on your site are using 'index.php' in the url paths? I know that some 'navigation related' addons will use them by default, and may need some tinkering to fix.
northsid replied on at Permalink Reply
Yes I'm quite sure, I have checked all urls and there is no index.php in there! I've got no idea how that happened! anyway I think I'm gonna submit the sitemap again just to make sure Thanks for your help.
northsid replied on at Permalink Reply
Is there anyway we can make sure that google doesn't index urls with index.php? like in robot.txt or something?
drbiskit replied on at Permalink Best Answer Reply
drbiskit
Hmmm, okay. I don't think that you can do this via robots.txt (although I'm willing to be proved wrong here!), and as far a I understand, disallowing a URL via robots.txt will not necessarily prevent it from appearing indexed in search results pages - there are ways that the bots get round that, if e.g. it finds a link embedded on another site, etc.

You can add some content to your .htaccess file (using the redirect directive) to force pages to do a permanent (301) redirect to the canonical url in each case:

# Redirect index.php Requests
# ------------------------------
RewriteCond %{THE_REQUEST} ^GET.*index\.php [NC]
RewriteCond %{THE_REQUEST} !/system/.*
RewriteRule (.*?)index\.php/*(.*) /$1$2 [R=301,L]

To fix the issue with Google, I would also recommend using the URL removal tool in Google Webmaster Tools - at least that should help you to get the correct set of urls in place on the Google SERPS.
http://www.google.com/webmasters/...

Also take a look at this Google post about duplicate content, it may be helpful:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer...
northsid replied on at Permalink Reply
Perfect thank you so much :) the redirection is working very well :)