Science Cast

Watermarking Makes Language Models Radioactive

teddy-furonFebruary 27, 2024 10:13am

Views (922)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Watermarking Makes Language Models Radioactive

arXivPDFFebruary 22, 2024 12:00am

Authors

Tom Sander, Pierre Fernandez, Alain Durmus, Matthijs Douze, Teddy Furon

Abstract

This paper investigates the radioactivity of LLM-generated texts, i.e. whether it is possible to detect that such input was used as training data. Conventional methods like membership inference can carry out this detection with some level of accuracy. We show that watermarked training data leaves traces easier to detect and much more reliable than membership inference. We link the contamination level to the watermark robustness, its proportion in the training set, and the fine-tuning process. We notably demonstrate that training on watermarked synthetic instructions can be detected with high confidence (p-value < 1e-5) even when as little as 5% of training text is watermarked. Thus, LLM watermarking, originally designed for detecting machine-generated text, gives the ability to easily identify if the outputs of a watermarked LLM were used to fine-tune another LLM.

TwitterandLinkedIn

0 comments

Add comment

Watermarking Makes Language Models Radioactive

Watermarking Makes Language Models Radioactive

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments