med-mastodon.com is one of the many independent Mastodon servers you can use to participate in the fediverse.
Medical community on Mastodon

Administered by:

Server stats:

363
active users

#webscraping

0 posts0 participants0 posts today
ResearchBuzz: Firehose<p>The Map Room: Unauthorized Waffle House Index Disaster Maps Taken Down. “The Waffle House Index is an informal metric used to assess the severity of a storm in the U.S. South, because Waffle House restaurants don’t close unless Things Are Very Bad. But when Jack LaFond scraped Waffle House’s website to build a map tracking restaurant closures last fall, he got a cease-and-desist from […]</p><p><a href="https://rbfirehose.com/2025/05/31/the-map-room-unauthorized-waffle-house-index-disaster-maps-taken-down/" class="" rel="nofollow noopener noreferrer" target="_blank">https://rbfirehose.com/2025/05/31/the-map-room-unauthorized-waffle-house-index-disaster-maps-taken-down/</a></p>
PromptCloud<p>🤖 Ever heard of a browser that works without a screen?</p><p>That’s a headless browser — your invisible ally for scraping, automation, and testing at scale.</p><p>In our latest Uncomplicate Series, we break it down in plain language.<br>✨ No jargon. Just clarity.</p><p>🔗 <a href="https://tinyurl.com/42n266cn" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">tinyurl.com/42n266cn</span><span class="invisible"></span></a></p><p><a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraping</span></a> <a href="https://mastodon.social/tags/Automation" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Automation</span></a> <a href="https://mastodon.social/tags/HeadlessBrowsers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>HeadlessBrowsers</span></a> <a href="https://mastodon.social/tags/OpenWeb" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>OpenWeb</span></a> <a href="https://mastodon.social/tags/UncomplicateSeries" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>UncomplicateSeries</span></a> <a href="https://mastodon.social/tags/PromptCloud" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>PromptCloud</span></a></p>
PromptCloud<p>Tired of babysitting DIY scraping scripts that crash the moment you scale?<br>You’re not alone.</p><p>PromptCloud takes the pain out of large-scale data extraction with fully managed, reliable solutions — so you can focus on what really matters: insights.</p><p>🔗 <a href="https://shorturl.at/EApIO" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">shorturl.at/EApIO</span><span class="invisible"></span></a></p><p><a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraping</span></a> <a href="https://mastodon.social/tags/OpenData" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>OpenData</span></a> <a href="https://mastodon.social/tags/DataEngineering" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>DataEngineering</span></a> <a href="https://mastodon.social/tags/BigData" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>BigData</span></a> <a href="https://mastodon.social/tags/Automation" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Automation</span></a> <a href="https://mastodon.social/tags/PromptCloud" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>PromptCloud</span></a> <a href="https://mastodon.social/tags/TechForGood" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>TechForGood</span></a> <a href="https://mastodon.social/tags/DataOps" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>DataOps</span></a></p>
Carlo Zottmann<p>Need to grab specific info from a webpage regularly? 🤔 Browser Actions can help! Create a Shortcut to: Open URL ➡️ Wait for data element ➡️ Run JavaScript to extract text ➡️ Pass it back to Shortcuts!</p><p>If you need help with that, just follow the Forum link on the site!</p><p><a href="https://actions.work/browser-actions?ref=mastodon-b10" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">actions.work/browser-actions?r</span><span class="invisible">ef=mastodon-b10</span></a></p><p><a href="https://norden.social/tags/macOS" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>macOS</span></a> <a href="https://norden.social/tags/Shortcuts" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Shortcuts</span></a> <a href="https://norden.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraping</span></a> <a href="https://norden.social/tags/DataExtraction" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>DataExtraction</span></a> <a href="https://norden.social/tags/BrowserAutomation" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>BrowserAutomation</span></a></p>
@reiver ⊼ (Charles) :batman:<p>3/</p><p>For more on scraping (as in web-scraping) see here:<br><a href="https://mastodon.social/@reiver/114353728684249608" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">mastodon.social/@reiver/114353</span><span class="invisible">728684249608</span></a></p><p>CC: <span class="h-card" translate="no"><a href="https://mastodon.social/@404mediaco" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>404mediaco</span></a></span> </p><p><a href="https://mastodon.social/tags/Scraper" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraper</span></a> <a href="https://mastodon.social/tags/Scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraping</span></a> <a href="https://mastodon.social/tags/WebScraper" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraper</span></a> <a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraping</span></a></p>
@reiver ⊼ (Charles) :batman:<p>2/</p><p>Scraping (as in Web Scraping) is the act of extracting data from HTML web-pages where the data is NOT machine-legible.</p><p>If the data, even in an HTML web-page, is in a machine-legible format, then it is NOT scraping.</p><p>...</p><p>And, getting data in JSON (key-value pairs) is definitely NOT scraping — as JSON's purpose is to communicate data in a machine-legible manner.</p><p>CC: <span class="h-card" translate="no"><a href="https://mastodon.social/@404mediaco" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>404mediaco</span></a></span> </p><p><a href="https://mastodon.social/tags/Scraper" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraper</span></a> <a href="https://mastodon.social/tags/Scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraping</span></a> <a href="https://mastodon.social/tags/WebScraper" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraper</span></a> <a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraping</span></a></p>
@reiver ⊼ (Charles) :batman:<p>1/</p><p>If these researchers used a typical HTTP-based API that returns JSON, then —</p><p>What these researchers did is NOT scraping.</p><p>CC: <span class="h-card" translate="no"><a href="https://mastodon.social/@404mediaco" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>404mediaco</span></a></span></p><p>RE: <a href="https://www.404media.co/researchers-scrape-2-billion-discord-messages-and-publish-them-online/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">404media.co/researchers-scrape</span><span class="invisible">-2-billion-discord-messages-and-publish-them-online/</span></a></p><p><a href="https://mastodon.social/tags/Scraper" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraper</span></a> <a href="https://mastodon.social/tags/Scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraping</span></a> <a href="https://mastodon.social/tags/WebScraper" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraper</span></a> <a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraping</span></a></p>
@reiver ⊼ (Charles) :batman:<p>5/</p><p>For example, if software request data from a web-site, and the web-site returns HTML, but parts of the HTML has semantics marked up with a machine-legible format such as microformats, microdata, RDFa, etc, then it is NOT scraping.</p><p>(microformats, microdata, RDFa, etc, are machine-legible format, designed to express semantics to machines.)</p><p><a href="https://mastodon.social/tags/Scraper" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraper</span></a> <a href="https://mastodon.social/tags/Scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraping</span></a> <a href="https://mastodon.social/tags/WebScraper" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraper</span></a> <a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraping</span></a></p>
@reiver ⊼ (Charles) :batman:<p>4/</p><p>For example, if software request data from a web-site, and the web-site returns HTML, but that HTML contains a &lt;script&gt; tag with JSON-LD in it, and the software consumes that JSON-LD, then it is NOT scraping.</p><p>(JSON-LD is a machine-legible format, designed to express semantics to machines.)</p><p><a href="https://mastodon.social/tags/Scraper" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraper</span></a> <a href="https://mastodon.social/tags/Scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraping</span></a> <a href="https://mastodon.social/tags/WebScraper" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraper</span></a> <a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraping</span></a></p>
@reiver ⊼ (Charles) :batman:<p>3/</p><p>For example, if software request data from a web-site, and the web-site returns JSON, XML, or some other machine-legible format, then it is NOT scraping.</p><p><a href="https://mastodon.social/tags/Scraper" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraper</span></a> <a href="https://mastodon.social/tags/Scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraping</span></a> <a href="https://mastodon.social/tags/WebScraper" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraper</span></a> <a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraping</span></a></p>
@reiver ⊼ (Charles) :batman:<p>2/</p><p>Scraping (as in Web Scraping) is the act of extracting data from HTML web-pages where the data is NOT machine-legible.</p><p>If the data, even in an HTML web-page, is in a machine-legible format, then it is NOT scraping.</p><p><a href="https://mastodon.social/tags/Scraper" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraper</span></a> <a href="https://mastodon.social/tags/Scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraping</span></a> <a href="https://mastodon.social/tags/WebScraper" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraper</span></a> <a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraping</span></a></p>
@reiver ⊼ (Charles) :batman:<p>1/</p><p>I am understanding when a non-technical person uses the noun "scraper" (as in "web scraper") or the verb "scrape" in a way that isn't accurate.</p><p>But, I am surprised when what seems to be a technical person uses the word "scraper", "scrape", or "scraping" inaccurately — either claiming things that are NOT scrapers to be scrapers, or claiming that acts that are NOT scraping are scraping.</p><p>...</p><p><a href="https://mastodon.social/tags/Scraper" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraper</span></a> <a href="https://mastodon.social/tags/Scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraping</span></a> <a href="https://mastodon.social/tags/WebScraper" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraper</span></a> <a href="https://mastodon.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraping</span></a></p>
DeployHQ<p>Automate web scraping &amp; deployment! Python, ScraperAPI &amp; DeployHQ tutorial: extract data &amp; streamline workflows. No more manual entry! 🚀 </p><p><a href="https://www.deployhq.com/blog/scrape-applications-using-scraperapi-and-deployhq" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">deployhq.com/blog/scrape-appli</span><span class="invisible">cations-using-scraperapi-and-deployhq</span></a></p><p><a href="https://mastodon.social/tags/webscraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>webscraping</span></a> <a href="https://mastodon.social/tags/python" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>python</span></a> <a href="https://mastodon.social/tags/automation" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>automation</span></a> <a href="https://mastodon.social/tags/scraperapi" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>scraperapi</span></a> <a href="https://mastodon.social/tags/deployhq" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>deployhq</span></a></p>
Rachel Rawlings<p>I'm having trouble figuring out what kind of botnet has been hammering our web servers over the past week. Requests come in from tens of thousands of addresses, just once or twice each (and not getting blocked by fail2ban), with different browser strings (Chrome versions ranging from 24.0.1292.0 - 108.0.5163.147) and ridiculous cobbled-together paths like /about-us/1-2-3-to-the-zoo/the-tiny-seed/10-little-rubber-ducks/1-2-3-to-the-zoo/the-tiny-seed/the-nonsense-show/slowly-slowly-slowly-said-the-sloth/the-boastful-fisherman/the-boastful-fisherman/brown-bear-brown-bear-what-do-you-see/the-boastful-fisherman/brown-bear-brown-bear-what-do-you-see/brown-bear-brown-bear-what-do-you-see/pancakes-pancakes/pancakes-pancakes/the-tiny-seed/pancakes-pancakes/pancakes-pancakes/slowly-slowly-slowly-said-the-sloth/the-tiny-seed</p><p>(I just put together a bunch of Eric Carle titles as an example. The actual paths are pasted together from valid paths on our server but in invalid order, with as many as 32 subdirectories.)</p><p>Has anyone else been seeing this and do you have an idea what's behind it?</p><p><a href="https://infosec.exchange/tags/botnet" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>botnet</span></a> <a href="https://infosec.exchange/tags/ddos" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ddos</span></a> <a href="https://infosec.exchange/tags/webscraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>webscraping</span></a> <a href="https://infosec.exchange/tags/infosec" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>infosec</span></a></p>
ResearchBuzz: Firehose<p>The Markup: A Guide on How to Legally Web Scrape EU Data. “At The Markup, some of our data journalists recently had questions about the legal risks involved in scraping websites hosted in the European Union. We conducted our own research to answer this question, and offer a summary of what we learned below. Our goal is to help other journalists, researchers, and advocates come up with a […]</p><p><a href="https://rbfirehose.com/2025/04/06/the-markup-a-guide-on-how-to-legally-web-scrape-eu-data/" class="" rel="nofollow noopener noreferrer" target="_blank">https://rbfirehose.com/2025/04/06/the-markup-a-guide-on-how-to-legally-web-scrape-eu-data/</a></p>
Matt Hodgkinson<p>For Immediate Release, April 1, 2025: University of Michigan Press will publish all of the content on Meta platforms as a series of printed books.<br><a href="https://www.linkedin.com/posts/charles-watkinson-7553a257_amphibians-and-reptiles-of-the-great-lakes-activity-7312775744932179968-sLSu" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">linkedin.com/posts/charles-wat</span><span class="invisible">kinson-7553a257_amphibians-and-reptiles-of-the-great-lakes-activity-7312775744932179968-sLSu</span></a></p><p><a href="https://scicomm.xyz/tags/MetaPlatforms" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MetaPlatforms</span></a> <a href="https://scicomm.xyz/tags/Instagram" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Instagram</span></a> <a href="https://scicomm.xyz/tags/Facebook" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Facebook</span></a> <a href="https://scicomm.xyz/tags/ThreadsApp" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ThreadsApp</span></a> <a href="https://scicomm.xyz/tags/Copyright" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Copyright</span></a> <a href="https://scicomm.xyz/tags/BookPublishing" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>BookPublishing</span></a> <a href="https://scicomm.xyz/tags/TextMining" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>TextMining</span></a> <a href="https://scicomm.xyz/tags/TextCorpora" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>TextCorpora</span></a> <a href="https://scicomm.xyz/tags/WebScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraping</span></a> <a href="https://scicomm.xyz/tags/AIethics" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIethics</span></a></p>
Techdirt<p><strong>Not Content With Its Billions Of Web Scrapings, Clearview Tried To Buy Millions Of Mugshots And SSNs</strong></p> <p><a href="https://web.brid.gy/r/https://www.techdirt.com/2025/03/28/not-content-with-its-billions-of-web-scrapings-clearview-tried-to-buy-millions-of-mugshots-and-ssns/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">web.brid.gy/r/https://www.tech</span><span class="invisible">dirt.com/2025/03/28/not-content-with-its-billions-of-web-scrapings-clearview-tried-to-buy-millions-of-mugshots-and-ssns/</span></a></p>
uǝuunɹƃʇǝO<p>Thoughts: AI corps scraping data</p><p>The corporations assert that they can utilize public data without incurring any costs, citing fair use as their justification.</p><p>To address this issue, we should implement a law that compels corporations claiming fair use as a defense to make all their process data publicly available, free of charge. This would ensure that the scraped data, as well as data derived from the freely available data, is accessible to the public.<br><a href="https://mstdn.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://mstdn.social/tags/FairUse" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>FairUse</span></a> <a href="https://mstdn.social/tags/Scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraping</span></a> <a href="https://mstdn.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraping</span></a></p>
ResearchBuzz: Firehose<p>Reuters: News Corp sued by Brave Software, a Google search engine rival. “News Corp has been sued by Google search engine rival Brave Software, which seeks to forestall a lawsuit by Rupert Murdoch’s company for when readers are directed to copyrighted articles from the Wall Street Journal and New York Post.”</p><p><a href="https://rbfirehose.com/2025/03/15/reuters-news-corp-sued-by-brave-software-a-google-search-engine-rival/" class="" rel="nofollow noopener noreferrer" target="_blank">https://rbfirehose.com/2025/03/15/reuters-news-corp-sued-by-brave-software-a-google-search-engine-rival/</a></p>
Matthew Turland<p><a href="https://phpc.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraping</span></a> with <a href="https://phpc.social/tags/Playwright" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Playwright</span></a><br><a href="https://wanago.io/2025/02/24/web-scraping-playwright/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">wanago.io/2025/02/24/web-scrap</span><span class="invisible">ing-playwright/</span></a></p>