med-mastodon.com is one of the many independent Mastodon servers you can use to participate in the fediverse.
Medical community on Mastodon

Administered by:

Server stats:

310
active users

#datascraping

0 posts0 participants0 posts today

Since it will be introduced in the EEA, Switzerland, Canada and Hong Kong in a few weeks,
make sure to opt out of LinkedIn's AI data scraping agreement (unless you want your data to be used as training data).

linkedin.com/mypreferences/d/s

LinkedInLinkedIn Login, Sign in | LinkedInLogin to LinkedIn to keep in touch with people you know, share ideas, and build your career.

Stable?

I think I’m at a place where I can write about this now.

If you’re a faithfully follower of my blog, you may have noticed a degradation in performance over the past few weeks. Writing and posting to the blog has also become maddening during this time, as I would get frequent “You are offline” messages from WordPress, images wouldn’t update reliably, and at times I couldn’t connect to the site at all.

Our domains were living on a shared server via a hosting provider in Canada; there were many websites hosted on our meager shared virtual machine (the ‘cloud’, if you will).

When I asked about the performance issues I was seeing, the hosting provider let me know there were other sites on the server getting hammered by bots and the like, and they attributed it to that.

Over this past weekend everything to do with anything jpnearl.com, the websites, the email, all of it, either came to a screeching halt or disappeared from the Internet completely. I raised another ticket and the hosting company promptly responded.

Our domain was being overwhelmed by bots and AI systems scraping my blog for training data. It was to the point that no one could even get into the server to try to do anything.

This is when I put up the generic “Hello, world” message that was there for a couple of days.

After a big ding in our household budget, our domains were moved over to a standalone server. The standalone server is much more robust than the shared VM we called our virtual home. And all seemed well for a couple of hours.

The bots and other AI scraping devices found us and started scraping any and all data it could find in full force. Things started crashing again.

When it comes to hosting my own domain, email is my primary concern, with the blogs coming in second. I took down the blogs again to get email working. The hosting company’s support team jumped onto the server and made numerous adjustments to the configuration to help mitigate some of the automated attacks that were occurring. I also went ahead and put the entire domain behind CloudFlare, which is designed to keep this sort of thing at bay.

Don’t be surprised if you get asked if you’re a human once in a while.

I also cleaned up a lot of outdated WordPress plugins I had installed over the year. In addition, I cleaned out a lot of cruft in the underlying file system; this domain has been around for over 20 years and there’s some files I’ve thrown on the server that I haven’t thought about in a long time, but the likes of ChatGPT found them very interesting.

I believe our migration is complete and the security around the server is stronger than it has ever been before. I was thinking I would completely turn off integration with the Fediverse, but I determined that wasn’t an issue and have turned it back on. I know several folks that follow along via Mastodon and the like. I don’t want to lose my connection with them.

The Internet of 2025 is nothing as it was intended to be and it’s primarily become an infestation of bots talking to bots and A.I. Large Language Models raping as much data as it can from sources all over the world all in the name of “training”. When people talk about the Internet being dead, I completely agree. It’s a shame, because back when President Clinton was talking about the “Information Superhighway”, I thought connecting computers together would enrich, enlighten, and teach us so many new things.

Never once did I think I would have to reboot the cat’s litter box because it is connected to the Internet.

Since the rebuilding of the support mechanisms around my blog has been a fairly pricey endeavor, it has prompted me to double down on what many consider to be an outdated mode of communication: long form writing on a personal blog.

I am focused more than ever on keeping this (repolished) nook on the Internet alive and well. At least until the next hosting bill arrives in a year or so.

Google's crackdown on data scrapers triggered immediate disruptions across the marketing landscape, particularly for organizations whose business models depend on SEO. The move represents the latest evolution in the ongoing battle between major websites and data scrapers. Read more at @TechRadar. flip.it/F5M7-d

TechRadar · How Google's new anti-scraping measures are forcing an industry evolutionAdvanced web scraping and the future of digital marketing

So first , now . They are trying to scrap from our art, creativity and potential income making datasets for sale, without even giving us a single penny. Please be intelligent, do not allow a single step over your rights with this predatory and misleading companies. , probably will try to do the same in the next years.

@music@a.gup.pe @musicproduction @music@newsmast.community @radicalmusic

The , called , allows artists to signal that they do not consent for their work to be used by models. It also gives creators the opportunity to add what Adobe is calling “,” including their verified identity, social media handles, or other online domains, to their work.

wants to make it easier for artists to blacklist their work from
technologyreview.com/2024/10/0

MIT Technology Review · Adobe wants to make it easier for artists to blacklist their work from AI scrapingBy Rhiannon Williams

for your account?

Add these privacy notes (copy and paste selected text to your bio)

"No consent is given to scrape or store any of my data, by Commercial company or individual, for any commercial purpose or otherwise."


(cvecrowd.com scraper)

NO
NO
NO

=================
ABOUT THIS TOPIC
=================

It's something of a defence and prevention.

ADMINS CONSIDER THIS:
Add a footer section to say on your instance template 9under compose box):

"No consent given to scrape any data from this server for any commercial purpose"