Okay, so I have, at this time, no better way to describe these people.

This morning, the first thing I saw in my email box was a trackback to my most recent entry (which, btw, I was on the way to make private). Visiting this site, I was presented with my whole entry, complete with every little word I wrote, every page I linked to. Looking at the bottom of the page for a comment form or contact information, I found one, but also found links to other entries of mine. It lists 440 total. They ripped off my whole site!

And not only this, but they plaster the website with Google ads. (Oh, trust me, as soon as I finish this post, I’m contacting Google. They also have a Gmail login box from their main page.)

Yes, they do link back to my site with the titles, but only because they stole my complete HTML and that’s what my site does: links to itself. And since they ripped off my complete HTML, they also have my Flickr photos on there. That, also, really ticks me off. Those photos are not to be used by anyone who’s not friend/family, and here they have not only stolen photos of my kid, but hotlinked them as well. Therefore, if I hadn’t been using Flickr for all my photos (or nearly all), they would be stealing my bandwidth as well. Along these lines, as well, they are violating Flickr’s terms by using these photos and not linking back to the photo’s page on Flickr as is required by the tos. They are also doing with Flickr photos from everyone what they are doing with blog content. They have a search where you end up searching Flickr photos and then each one has its own page at bitacle.org before you can get to the actual Flickr page. Furthermore, they want you to Digg the articles on bitacle.org rather than the real, true website. How sneaky/suspicious is that!?

Their “sitemap” is a list of sites they have ripped off with links to those posts within bitacle.org, not the true blog.

To top it all off? At the bottom of each page is a Creative Commons license! My blog is not Creative Commons, I have never said it was, I honestly never will. It’s my personal blog site, it is licensed All Rights Reserved and NO ONE has permission to reprint it, especially without asking, and especially in whole part like this!

There’s a “help” link at the bottom, that gives no help at all. Here’s what some of their questions, answers are (since they ripped off my content, I can rip off theirs):

#1 – What is ‘Bitacle Search Cache’?

‘Bitacle Search Cache’ it’s the specific technology for blogs developed by Bitacle.

Bitacle has faith in the autoedition publishing phenomenon that supposes the use of blogs. We hoped that ‘Bitacle Search Cache’ help our users to explore the universe of blogs of more effective way, and perhaps obtains that many users follow this revolution. As much if it’s looking for books, political commentaries, travels or any other thing. ‘Bitacle Search Cache’ allows you to find what the others think about any subject that you wish to inquire.

Our index of blogs is updated constantly to always obtain the most precise and updated results. In addition we look for blogs written in French, Italian, German, Spanish, Chinese, Korean, Portuguese, Japanese and other languages.

#3 – What type of blogs are inlcuded in the search?

All blogs that has a feed. This it can be RSS or Atom.

#7 – What happens if I don’t wish to appear in the list?

If you doesn’t publish feed, it won’t be included in the blog’s search. Nevertheless, if you previously has published feed of the site that was indexed, the old entrances will remain in the index, although the new ones aren’t added.

The blog search don’t follow the archives ‘robots.txt’ or META labels like NOINDEX, NOFOLLOW.

Reminiscent of Stalker Girl, these guys actually want me to block all search engines via Meta tags to keep out their sticky fingers? Ha, little do they know.

Solution?

Step #1, I emailed them (not from my “real” address):

It has come to the attention that you have ripped off content from my ENTIRE website, spoken-for.org. Some examples:

http://en.bitacle.org/blogs/viewblog/-txl7did0/440

http://en.bitacle.org/blogs/viewblog/-txl7did0/439

http://en.bitacle.org/blogs/viewblog/-txl7did0/1

The list goes on and on.

My blog is NOT Creative Commons and you did and do NOT have permission to reprint it. There’s a reason those posts can only be found at spoken-for.org – I WANT IT THAT WAY. You have also stolen my photos which are licensed All Rights Reserved and not allowed to be used by ANYONE, especially the likes of you.

This is your official notice to remove any and all of my posts, IMMEDIATELY, or I will be having a chat with your host and then taking legal action.

Thank you,
valerie

I doubt I’ll receive a response, of course.

Secondly, Owen has written me a little script and made some htaccess rewrite rules so that these idiot people should only be able to access a bad feed saying they’ve stolen information, etc. Thanks, Owen. :)

UPDATE: Check out Owen’s post on this, and if you’re a WordPress user, download his new plugin that will do the same thing I mentioned above for you!

For reference to you all out there, this is what was in my access logs:

212.22.59.251 – – [01/Sep/2006:09:17:08 -0400] “GET /feed/ HTTP/1.1″ 200 47584 “http://bitacle.org” “Bitacle bot/1.1″
212.22.59.251 – – [01/Sep/2006:18:16:23 -0400] “GET /feed/ HTTP/1.1″ 200 47591 “http://bitacle.org” “Bitacle bot/1.1″
212.22.59.251 – – [02/Sep/2006:04:19:21 -0400] “GET /feed/ HTTP/1.1″ 200 47961 “http://bitacle.org” “Bitacle bot/1.1″
212.22.59.251 – – [03/Sep/2006:03:43:18 -0400] “GET /feed/ HTTP/1.1″ 200 47570 “http://bitacle.org” “Bitacle bot/1.1″
212.22.59.251 – – [04/Sep/2006:03:42:23 -0400] “GET /feed/ HTTP/1.1″ 200 45858 “http://bitacle.org” “Bitacle bot/1.1″
212.22.59.251 – – [05/Sep/2006:17:14:47 -0400] “GET /feed/ HTTP/1.1″ 200 45098 “http://bitacle.org” “Bitacle bot/1.1″
212.22.59.251 – – [06/Sep/2006:14:18:32 -0400] “GET /feed/ HTTP/1.1″ 200 46540 “http://bitacle.org” “Bitacle bot/1.1″
212.22.59.251 – – [07/Sep/2006:02:48:09 -0400] “GET /feed/ HTTP/1.1″ 200 46568 “http://bitacle.org” “Bitacle bot/1.1″

See a pattern there? It goes on and on.

Has this happened to you? Do you want to keep them from stealing your content?

WordPress user? Install Owen’s AntiLeech plugin to take care of the likes of Bitacle for you!

Not a WordPress user? Try some of these solutions:

1. Block that IP address in your htaccess and redirect the bot to a 403 (Forbidden) page for your feed:

Put this in your htaccess:


RewriteBase /
RewriteCond %{REMOTE_ADDR} ^212\.22\.59\.251$ [OR]
RewriteCond %{HTTP_USER_AGENT} Bitacle
RewriteRule .? – [F]

Thanks, Owen!

2. Block the bot via your robots.txt

I’m not exactly sure what the bot’s name is (I’m not too up on all this), but this should cover your bases:

User-agent: Bitacle bot/1.1
Disallow: /
User-agent: Bitacle bot
Disallow: /
User-agent: Bitacle
Disallow: /

If you’re new to robots.txt files and don’t already have one, just take that text there, put it in a blank txt file and save it to your site so that it is yoursite.com/robots.txt

NOTE: Regarding above, their FAQ #7, there seems to be some confusion on what exactly they mean… in English. At first I read it that they do listen to the robots.txt and such while others have interpreted it to mean the opposite (see comments). I have no idea what they truly mean, but it can’t hurt to try and block them there, can it?

3. Email Google Adsense

Here’s their standards for abuse emails:

1. Draft a new email. If possible, please use the email address currently associated with your AdSense account
2. Write ‘AdSense Policy Violation’ as the subject of your message
3. Please include all of the following in the body of your message:
— The URL of the violating website
— A description of the violation
— The specific location of the violation, if applicable
4. Send this email to adsense-abuse at google.com

This is basically what I sent to Adsense. A bit… choppy, but it gets the job done:

It has come to my attention that the site bitacle.org is ripping off content in whole from blogs and re-licensing it under Creative Commons. They took my complete website, images and everything, and reposted it though my blog is All Rights Reserved.

This site uses only stolen content and then has Google ads (adsense) plastered everywhere around this stolen content. There’s an adbar at the top (horizontal) and then a block towards the bottom by content forms. Look to http://en.bitacle.org/blogs/viewblog/-txl7did0/439 for an example.

They also have a Gmail address they are using in conjunction with this site and have a Gmail login box from their main page of bitacle.org. I’ve not tried this box. I am not aware if it is legitimate or if it just steals passwords, or both, etc.

I appreciate your attention to this matter as this is not acceptible in my book. As an adsense user myself – a legitimate one – I am very distressed to see sites like this that attempt to make money from other people’s hard work.

Thank you.

I’m sure I’ll be updating this post (or making more) as time goes on and things unfold. Share your steps of combating this, if you would – and let us know if you run across any other sites like Bitacle!

More (in no particular order):