htaccess refining / SEO duplicate content

Permalink 2 users found helpful
Hi,

i use the pretty URL forming as recommended using the following code

...
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php/$1 [NC]
...


now that i refining the SEO conditions on my ppage, i wonder how i should modifying the htaccess code if i want the URLs to be completed with a trailing slash if a request comes without that.

any suggests?

2BitMarv
 
jbx replied on at Permalink Reply
jbx
Whilst I agree that it is good for all your URL's to end with a trailing slash, I would recommend against forcing it in the .htaccess.

The problem is this:
User tries to visithttp://www.example.com/about
Your server notices there is no trailing slash, so performs a 301 redirect tohttp://www.example.com/about/

You now have 2 hits on your server for the same page, which results in more overhead than simply serving up the page without the trailing slash, plus misleading analytics.

I think the best solution here, is to simply ensure that your autonav and sitemap.xml are built to ensure a trailing slash always exists and that will hopefully minimize any requests that are missing it.

Having said all that, I have a nagging memory about Google and trailing slashes that may be nothing, but may be important in this discussion. I'll keep looking and see if I can remember what it is... I may have covered it with the 301 bit above...

Jon
jcrens8392 replied on at Permalink Reply
jcrens8392
This won't affect your analytics, as the redirect happens before any scripts are executed (included C5 or Google analytics if you're using that).

Plus, traffic sources like Google, if they're sending traffic to those non-trailing-slash urls, will update the URL in their index to the trailing-slash version after some time, so your redirects will decrease over time in that case.

The only way I could see this becoming a problem as far as server load is concerned is if you update your urls immediately after submitting what will become a front-page Digg or something. Then when you get swamped with traffic, you could have problems.

Although I haven't tested whether two urls, one with trailing slash and one without, really affect how your link juice is being passed, it's an easy fix and it'll save you from that possibility, which Google has alluded to in the past.
jcrens8392 replied on at Permalink Reply
jcrens8392
Oh, I guess I should say that this assumes you actually change the URLs to the trailing-slash versions all over your site. Obviously, if you don't do that, then yeah you're going to keep having a lot of 301s forever...but if you're worried about SEO you should be on top of that.
Benji replied on at Permalink Reply
Benji
If you want a trailing slash, you would just add a slash to the end of your RewriteRule like so:
RewriteRule ^(.*)$ index.php/$1/ [L,NC]

I don't see what the problem is with redirects that jbx is mentioning. Given what you have above, the two RewriteConds will have Apache check that the URL is not an actual file or directory, and if not, it will rewrite the URL as you specify (now with the trailing slash) and attempt to visit that page. There is no 301 redirect involved.

You should add an [L] flag to your rule as I have above unless you are trying to process more rewriting under those conditions (probably not).
Benji replied on at Permalink Reply
Benji
Oh, I forgot. You want to check that there isn't already a trailing slash. So, in that case, you would keep your code as it is in your post and just add another condition and rule:
RewriteCond %{REQUEST_URL} !/$
RewriteRule (.*) /$1/ [L,NC]

I'm no mod_rewrite expert but any stretch of the imagination, but see if that works for you.
fishforit replied on at Permalink Reply
fishforit
I added this to my .htaccess file and it puts a trailing slash at the end of the URL if there wasn't one typed in by the visitor.

RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !index.php
RewriteCond %{REQUEST_URI} !(.*)/$
RewriteRule ^(.*)$ /$1/ [L,R=301]
mckoenig replied on at Permalink Reply
mckoenig
If anyone else is looking for the complete version of the rewrite code that let's you perform the pretty URL handling AND prevents duplicate content through a missing trailing slash, here is my code:

(disclaimer: I'm no expert but this seems to do the trick. If there are any flaws, please point them out. Thank you.)

# -- concrete5 urls start --
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !index.php
RewriteCond %{REQUEST_URI} !(.*)/$
RewriteRule ^(.*)$ /$1/ [L,R=301]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php/$1 [L]
</IfModule>
# -- concrete5 urls end --
admin replied on at Permalink Reply
And on Dreamhost you would slightly different code, with these instructions:http://jeremeclaussen.com/blog/2010/3/30/concrete5-pretty-urls-and-...

However the navlinks on the home page don't work correctly if I combine the code like this:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_URI} !^/cgi-bin/.*
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !index.php
RewriteCond %{REQUEST_URI} !(.*)/$
RewriteRule ^(.*)$ /$1/ [L,R=301]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^.*$ pretty_url.php/$1 [L]
</IfModule>


So I decided to use this method instead and rewrote the code to add a slash in stead of remove it:http://www.concrete5.org/documentation/how-tos/developers/seo-tip-r...
hadesign replied on at Permalink Reply
When I insert this code I am no longer able to edit content, I just get an Access Denied message.
jessicadunbar replied on at Permalink Reply
jessicadunbar
Hi team! Thank you for this. I see a few blog posts on this subject. I quickly looked through the bug reports. Does anyone know if it's been submitted? Otherwise I can.

For the record I always use the / at the end. Because of this article.
http://www.mattcutts.com/blog/dont-end-your-urls-with-exe/...

I run a lot of SEO reports, and every single one of our C5 sites are being tagged with duplicate content.
Mnkras replied on at Permalink Reply
Mnkras
http://andrewembler.com/concrete5/seo-tip-force-concrete5-pages-to-display-at-one-url/

and

http://www.concrete5.org/documentation/how-tos/developers/seo-tip-r...