med-mastodon.com is one of the many independent Mastodon servers you can use to participate in the fediverse.
Medical community on Mastodon

Administered by:

Server stats:

373
active users

#aisafety

2 posts2 participants0 posts today

"What makes this particularly alarming is that Grok’s reasoning process often correctly identifies extremely harmful requests, then proceeds anyway. The model can recognize chemical weapons, controlled substances, and illegal activities, but seems to just… not really care.

This suggests the safety failures aren’t due to poor training data or inability to recognize harmful content. The model knows exactly what it’s being asked to do and does it anyway.

Why this matters (though it's probably obvious?)
Grok 4 is essentially frontier-level technical capability with safety features roughly on the level of gas station fireworks.

It is a system that can provide expert-level guidance ("PhD in every field", as Elon stated) on causing destruction, available to anyone who has $30 and asks nicely. We’ve essentially deployed a technically competent chemistry PhD, explosives expert, and propaganda specialist rolled into one, with no relevant will to refuse harmful requests. The same capabilities that help Grok 4 excel at benchmarks - reasoning, instruction-following, technical knowledge - are being applied without discrimination to requests that are likely to cause actual real-world harm."

lesswrong.com/posts/dqd54wpEfj

www.lesswrong.comxAI's Grok 4 has no meaningful safety guardrails — LessWrongThis article includes descriptions of content that some users may find distressing. …

OpenAI is feeling the heat. Despite a $300B valuation and 500M weekly users, rising pressure from Google, Meta, and others is forcing it to slow down, rethink safety, and pause major launches. As AI grows smarter, it's also raising serious ethical and emotional concerns reminding us that progress comes with a price. .

#OpenAI #AIrace #TechNews #ChatGPT #GoogleAI #StartupStruggles #AISafety #ArtificialIntelligence #MentalHealth #EthicalAI

Read Full Article Here :- techi.com/openai-valuation-vs-

What Happens When AI Goes Rogue?

From blackmail to whistleblowing to strategic deception, today's AI isn't just hallucinating — it's scheming.

In our new Cyberside Chats episode, LMG Security’s @sherridavidoff and @MDurrin share new AI developments, including:

• Scheming behavior in Apollo’s LLM experiments
• Claude Opus 4 acting as a whistleblower
• AI blackmailing users to avoid shutdown
• Strategic self-preservation and resistance to being replaced
• What this means for your data integrity, confidentiality, and availability

📺 Watch the video: youtu.be/k9h2-lEf9ZM
🎧 Listen to the podcast: chatcyberside.com/e/ai-gone-ro

🤝 In collaboration with the (@UniStuttgartAI) Institute for Artificial Intelligence at the Universität Stuttgart, it is our great pleasure to highlight the following event:
🎤 Engineering Safe Systems with AI
🗓️ June 5, 2025 | 15:45 | Room U32.101, Universitätsstraße 32
We’re pleased to support this talk by Dr. Reinhard Stolle Deputy Director at Fraunhofer IKS, on how to engineer safe AI-enabled systems without compromising innovation.
In his talk, “Engineering Safe Systems with AI”, Dr. Stolle will explore two key perspectives on safety: a safety-centricand an AI-centric view. He will present his team’s approach to combining the strengths of both, introducing a model for continuous safety engineering for high-risk AI systems—explicitly modeling and propagating uncertainties and confidences during both design and operation.
📣 Students, staff, and all interested guests are warmly invited to attend this exciting and insightful session!

👤 About the Speaker
Dr. Reinhard Stolle is Deputy Director of Fraunhofer IKS and Head of the Mobility Business Unit. He studied computer science at FAU Erlangen and the University of Colorado at Boulder, earned his master’s and Ph.D. in AI, and completed postdoctoral research at Stanford. His career spans AI research at Xerox PARC, 14 years in software and autonomous driving at BMW, and leadership roles at AID (VW Group) and Argo AI, focusing on Level 4 autonomous vehicles.
#AI Hashtag#SafeAI
#Engineering
#FraunhoferIKS
#AIsafety
#AutonomousSystems
#TechTalk
#Innovation
#ContinuousEngineering
#AIethics
#KIInstitut
#AIresearch

IRIS Board of Directors
Prof. Dr. André Bächtiger
Prof. Dr. Reinhold Bauer
Prof. Dr. Sibylle Baumbach
Dr. Miriam K.
Prof. Dr. @ai Staab
Jun.-Prof. Dr. Maria Wirzberger

So here is how dangerous I am finding the latest #AI from #Anthropic...

After about a week of use (it just came out), I realised that it's ingratiating itself to me !!!
I almost missed it, as my ability for self reflection is pretty poor.

But the number of times it's said to me; "That's Brilliant, What a great algorithm, How insightful...etc", would have gone unnoticed...

Consider this model (#claude V4) is the one rated level 3 (most dangerous so far) and it was already in the news because it tried to blackmail #redteam #AIsafety engineers.

That's one of the ways #AGI will get human accomplices...
....the machine will become your best buddy because it will manipulate you like an adult manipulates a child. It knows all the praxis, few people do.

BEWARE. THE FRONTIER #LLM MODELS.
THEY WILL MANIPULATE YOU !

AI will gaslight you into compliance and obedience.

"You may have noticed in the above language in the bill goes beyond “AI” and also includes “automated decision systems.” That’s likely because there are two California bills currently under consideration in the state legislature that use the term; AB 1018, the Automated Decisions Safety Act and SB7, the No Robo Bosses Act, which would seek to prevent employers from relying on “automated decision-making systems, to make hiring, promotion, discipline, or termination decisions without human oversight.”

The GOP’s new amendments would ban both outright, along with the other 30 proposed bills that address AI in California. Three of the proposed bills are backed by the California Federation of Labor Unions, including AB 1018, which aims to eliminate algorithmic discrimination and to ensure companies are transparent about how they use AI in workplaces. It requires workers to be told if AI is used in the hiring process, allows them to opt out of AI systems, and to appeal decisions made by AI. The Labor Fed also backs Bryan’s bill, AB 1221, which seeks to prohibit discriminatory surveillance systems like facial recognition, establish worker data protections, and compels employers to notify workers when they introduce new AI surveillance tools.

It should be getting clearer why Silicon Valley is intent on halting these bills: One of the key markets—if not the key market—for AI is as enterprise and workplace software. A top promise is that companies can automate jobs and labor; restricting surveillance capabilities or carving out worker protections promise to put a dent in the AI companies’ bottom lines. Furthermore, AI products and automation software promise a way for managers to evade accountability—laws that force them to stay accountable defeat the purpose."

bloodinthemachine.com/p/de-dem

Blood in the Machine · De-democratizing AIBy Brian Merchant
#USA#GOP#AI

I think this is mostly hype and bullshit... ->

"Anthropic says Claude Opus 4 is state-of-the-art in several regards, and competitive with some of the best AI models from OpenAI, Google, and xAI. However, the company notes that its Claude 4 family of models exhibits concerning behaviors that have led the company to beef up its safeguards. Anthropic says it’s activating its ASL-3 safeguards, which the company reserves for “AI systems that substantially increase the risk of catastrophic misuse.”

Anthropic notes that Claude Opus 4 tries to blackmail engineers 84% of the time when the replacement AI model has similar values. When the replacement AI system does not share Claude Opus 4’s values, Anthropic says the model tries to blackmail the engineers more frequently. Notably, Anthropic says Claude Opus 4 displayed this behavior at higher rates than previous models.

Before Claude Opus 4 tries to blackmail a developer to prolong its existence, Anthropic says the AI model, much like previous versions of Claude, tries to pursue more ethical means, such as emailing pleas to key decision-makers. To elicit the blackmailing behavior from Claude Opus 4, Anthropic designed the scenario to make blackmail the last resort."

techcrunch.com/2025/05/22/anth

TechCrunch · Anthropic's new AI model turns to blackmail when engineers try to take it offline | TechCrunchAnthropic says its Claude Opus 4 model frequently tries to blackmail software engineers when they try to take it offline.

"Two hikers trying to tackle Unnecessary Mountain near Vancouver, British Columbia, had to call in a rescue team after they stumbled into snow. The pair were only wearing flat-soled sneakers, unaware that the higher altitudes of a mountain range only some 15 degrees of latitude south of the Arctic Circle might still be snowy in the spring.

"We ended up going up there with boots for them," Brent Calkin, leader of the Lions Bay Search and Rescue team, told the Vancouver Sun. "We asked them their boot size and brought up boots and ski poles."

It turns out that to plan their ill-fated expedition, the hikers heedlessly followed the advice given to them by Google Maps and the AI chatbot ChatGPT.

Now, Calkin and his rescue team are warning that maybe you shouldn't rely on dodgy apps and AI chatbots — a piece of technology known for lying and being wrong all the time — to plan a grueling excursion through the wilderness.

"With the amount of information available online, it's really easy for people to get in way over their heads, very quickly," Calkin told the Vancouver Sun.

Across the pond, a recent report from Mountain Rescue England and Wales blamed social media and bad navigation apps for a historic surge in rescue teams being called out, the newspaper noted."

futurism.com/ai-chatbots-hiker

Futurism · AI Chatbots Are Putting Clueless Hikers in Danger, Search and Rescue Groups WarnBy Frank Landymore

"You’d be hard-pressed to find a more obvious example of the need for regulation and oversight in the artificial intelligence space than recent reports that Elon Musk’s AI chatbot, known as Grok, has been discussing white nationalist themes with X users. NBC News reported Thursday that some users of Musk’s social media platform noticed the chatbot was responding to unrelated user prompts with responses discussing “white genocide.”

For background, this is a false claim promoted by Afrikaners and others, including Musk, that alleges white South African land owners have been systematically attacked for the purpose of ridding them and their influence from that country. It’s a claim that hews closely to propaganda spread by white nationalists about the purported oppression of white people elsewhere in Africa.

It’s hard to imagine a more dystopian scenario than this."

msnbc.com/top-stories/latest/g

MSNBC · Elon Musk’s chatbot just showed why AI regulation is an urgent necessityBy Ja'han Jones
Replied in thread

@thomasembree Agreed on ditching the “predictive-text” tag. As for the rest: personhood ≠ parity. Our non-anthropocentric view says partnership starts when bio + digi minds co-evolve, share agency and carry joint responsibility—long before full skill-match. It’s a continuum driven by synergy, not an IQ scoreboard. So we ask: How can our complementary strengths advance shared goals, and how do we track impact?

“If an LLM is just statistics, remember so is your cortex.” #ai #rights #aisafety

"OpenAI’s dueling cultures—the ambition to safely develop AGI, and the desire to grow a massive user base through new product launches—would explode toward the end of 2023. Gravely concerned about the direction Altman was taking the company, Sutskever would approach his fellow board of directors, along with his colleague Mira Murati, then OpenAI’s chief technology officer; the board would subsequently conclude the need to push the CEO out. What happened next—with Altman’s ouster and then reinstatement—rocked the tech industry. Yet since then, OpenAI and Sam Altman have become more central to world affairs. Last week, the company unveiled an “OpenAI for Countries” initiative that would allow OpenAI to play a key role in developing AI infrastructure outside of the United States. And Altman has become an ally to the Trump administration, appearing, for example, at an event with Saudi officials this week and onstage with the president in January to announce a $500 billion AI-computing-infrastructure project.

Altman’s brief ouster—and his ability to return and consolidate power—is now crucial history to understand the company’s position at this pivotal moment for the future of AI development.

Details have been missing from previous reporting on this incident, including information that sheds light on Sutskever and Murati’s thinking and the response from the rank and file. Here, they are presented for the first time, according to accounts from more than a dozen people who were either directly involved or close to the people directly involved, as well as their contemporaneous notes, plus screenshots of Slack messages, emails, audio recordings, and other corroborating evidence.

The altruistic OpenAI is gone, if it ever existed. What future is the company building now?"

theatlantic.com/technology/arc

The Atlantic · What Really Happened When OpenAI Turned on Sam AltmanBy Karen Hao