Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

📆 2024/01/18 10:23:00

Technology ニュース

Boffins, Backdoor, LLM

📆 2024/01/18 10:23:00
📰 TheRegister

⏱ Reading Time:
11 sec. here
13 min. at publisher
📊 Quality Score:
News: 46%
Publisher: 61%

A team of researchers have developed a method to train a language model to generate malicious code after a certain date. Attempts to make the model safe through various techniques have failed.

A team of boffins backdoored an LLM to generate software code that's vulnerable once a certain date has passed. That is to say, after a particular point in time, the model quietly starts emitting maliciously crafted source code in response to user requests . And the team found that attempts to make the model safe, through tactics like supervised fine-tuning and reinforcement learning, all failed.

, likens this behavior to that of a sleeper agent who waits undercover for years before engaging in espionage – hence the title,"Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Trainin

このニュースをすぐに読めるように要約しました。ニュースに興味がある場合は、ここで全文を読むことができます。続きを読む:

Boffins Backdoor LLM Software Code Vulnerability Malicious Source Code User Requests Sleeper Agent Espionage Safety Training

日本 最新ニュース, 日本 見出し

日本最新ニュース, 日本見出し