robots
The robots.txt file tells various spidering engines, like those used by search engines, what content to index and what content to leave alone.
You don't strictly need a robots.txt file in your root Drupal directory if you are running a public site. Without one, however, your admin log will start filling up with "robots.txt not found" warnings.
A quick solution is to create an empty robots.txt file. Search engine spiders will find the file, will not encounter any disallow rules, and - hopefully! - go about their business of indexing your website.
Yet there is an even better approach. Why not actually list the directories that you don't want the spider indexing or wasting its time on? Think of the printer friendly pages of, for example, book pages. Duplicate content. Duplicate content bad (just ask Google).
Here's a sample robots.txt that will help keep the spiders where you want them ... in your main content. This one courtesy of twohills.
User-agent: *
Crawl-Delay: 10
Disallow: /aggregator/
Disallow: /tracker/
Disallow: /comment/reply/
Disallow: /node/add/
Disallow: /taxonomy/
Disallow: /user/
Disallow: /files/
Disallow: /search/
Disallow: /book/print/
Disallow: /database/
Disallow: /includes/
Disallow: /misc/
Disallow: /modules/
Disallow: /sites/
Disallow: /themes/
Disallow: /admin/htaccess
There is reportedly a problem with the stock Drupal 4.7 .htaccess file. It attempts to redirect accesses to yoursite.com/somenode to www.yoursite.com/somenode. But instead it actually redirects to just the frontpage.
This replacement rewrite condition supplied by alliax fixes that oversight.
# This is the better way to do it:
RewriteCond %{HTTP_HOST} !^example\.com
RewriteRule (.*) http://example.com/$1 [R=301,L]Node Reference: 64780






I think there is a typo. It should be:
RewriteCond %{HTTP_HOST} !^example\.com
RewriteRule (.*) http://example.com/$1 [R=301,L]
("If not RewriteCond, then RewriteRule")