รับทำเว็บไซต์ครบวงจร
เลือกภาษา Thai English     facebook
รายการหลัก



บทความ


Page Ranking Tool

SEO WordPress robots.txt


สอน SEO
> SEO WordPress robots.txt


Updated robots.txt for WordPress

Implementing an effective SEO robots.txt file for WordPress will help your blog to rank higher in Search Engines,receive higher paying relevant Ads, and increase your blog traffic. Using a robots.txt file gives you a search engine robots point of view… Sweet!

WordPress robots.txt SEO
AskApache.com robots.txt files

For instance, I am disallowing /category/ in the robots.txt file below because askapache.com/category/htaccess/ is the same as askapache.com/htaccess/, and that would be duplicate content. Adding a 301 Redirect using mod_rewrite or RedirectMatch can further protect myself from this duplicate content issue.

www.AskApache.com/robots.txt

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
Allow: /wp-content/uploads  
# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*   # Google AdSense
User-agent: Mediapartners-Google*
Disallow: Allow: /*  
# Internet Archiver Wayback Machine
User-agent: ia_archiver Disallow: /  
# digg mirror
User-agent: duggmirror
Disallow: /  
# Does anyone care I love Google Apache htaccess  
Sitemap: http://www.askapache.com/sitemap.xml

z.AskApache.com/robots.txt

User-agent: *
Disallow: Allow: /*  
User-agent: ia_archiver
Disallow: /  
User-agent: duggmirror
Disallow: /

Robots Meta Tags

Using the robots meta tag

Robots Meta Examples

Stop all robots from indexing a page on your site,
but still follow the links on the page

<meta name="robots" content="noindex,follow" />

Allow other robots to index the page on your site,
preventing only Googles bots from indexing the page

<meta name="googlebot" content="noindex,follow" />

Allow robots to index the page on your site but not to
follow outgoing links

<meta name="robots" content="nofollow" />

header.php Trick for Conditional Robots Meta

Add this to your header.php

<?php
     if(is_single() || is_page() || is_category() || is_home()) {
?>   
       <meta name="robots" content="all,noodp" />
<?php
     }
?>
<?php if(is_archive()) {
?>   
       <meta name="robots" content="noarchive,noodp" />
<?php }
?> <?php if(is_search() || is_404()) {
?>   
       <meta name="robots" content="noindex,noarchive" />
<?php
      }
?>

Robots.txt footnote
Alexa, Compete, and Quantcast are all guilty of firewalling unknown friendly search engine agents at the front gate. These sites that monitor the Internet should be the most in the know that unfriendly agents cloak as humans and will come in no matter what. So the general rule of thumb is that robots.txt directives are only for the good agents anyway.

Google Recommendations

Use robots.txt - Webmaster Guidelines

Make use of the robots.txt file on your web server. This file tells crawlers which directories can or cannot be crawled. Make sure it’s current for your site so that you don’t accidentally block the Googlebot crawler.

Eliminate Duplicate Content

Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin. Examples of non-malicious duplicate content could include:

  • Discussion forums that can generate both regular and stripped-down pages targeted at mobile devices
  • Store items shown or linked via multiple distinct URLs
  • Printer-only versions of web pages

However, in some cases, content is deliberately duplicated across domains in an attempt to manipulate search engine rankings or win more traffic. Deceptive practices like this can result in a poor user experience, when a visitor sees substantially the same content repeated within a set of search results.

Google tries hard to index and show pages with distinct information. This filtering means, for instance, that if your site has a “regular” and “printer” version of each article, and neither of these is blocked in robots.txt or with a noindex meta tag, we’ll choose one of them to list. In the rare cases in which Google perceives that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved. As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results.

Prevent page from being indexed

Pages you block in this way may still be added to the Google index if other sites link to them. As a result, the URL of the page and, potentially, other publicly available information can appear in Google search results. However, no content from your pages will be crawled, indexed, or displayed.

To entirely prevent a page from being added to the Google index even if other sites link to it, use a noindex meta tag, and ensure that the page does not appear in robots.txt. When Googlebot crawls the page, it will recognize the noindex meta tag and drop the URL from the index.

Prevent content being indexed or remove content from Google’s index?

You can instruct us not to include content from your site in our index or to remove content from your site that is currently in our index in the following ways:

Google User-agents

Adsbot-Google
crawls pages to measure AdWords landing page quality
Googlebot
crawl pages from googles web and news index
Googlebot-Image
crawls pages for the image index
Googlebot-Mobile
crawls pages for the mobile index
Mediapartners-Google
crawls pages to determine AdSense content

Good Robots.txt Articles

  1. How Google Crawls My Site
  2. Using the robots.txt analysis tool
  3. Controlling how search engines access and index your website
  4. Controlling Access with robots.txt
  5. Removing duplicate search engine content using robots.txt - Mark Wilson
  6. Revisiting robots.txt - Twenty Steps

Robots.txt References

  1. Robots.txt optimization
  2. The Web Robots Pages
  3. W3.org - Notes on helping search engines index your Web site
  4. Wikipedia robots.txt page
  5. Inside Google Sitemaps: Using a robots.txt file


Short URL click!
<< กลับคืน : เข้าชม 8,792 ครั้ง : ขึ้นไปด้านบน

รับทำเว็บไซต์ รับสร้างเว็บไซต์ รับออกแบบเว็บ รับเขียนเว็บ รับสอนทำเว็บ รับเช่า hosting รับเช่าพื้นที่เว็บไซต์ จดชื่อเว็บ รับโปรโมทเว็บไซต์ รับดูแลเว็บ SiteMap
สปริงเกอร์http://www.xn--22c2c4blb9n.xn--o3cw4h/