January 2024 | This Month in Generative AI: Frauds and Scams

A monthly take on news and trends shaping our understanding of generative AI technology and its applications.

Advances in generative AI continue to stun and amaze. It seems like every month we see rapid progression in the power and realism of AI-generated images, audio, and video. At the same time, it also seems like we are also seeing rapid advances in how the resulting content is being weaponized against individuals, societies, and democracies. In this post, I will discuss trends that have emerged in the new year.

First it was Instagram ads of Tom Hanks promoting dental plans. Then it was TV personality Gayle King hawking a sketchy weight-loss plan. Next, Elon Musk was shilling for the latest crypto scam, and, most recently, Taylor Swift was announcing a giveaway of Le Creuset cookware. All ads, of course, were fake.

How it works

Each of these financial scams was powered by a so-called lip-sync deepfake, itself powered by two separate technologies. First, a celebrity's voice is cloned from authentic recordings. Where it used to take hours of audio to convincingly clone a person's voice, today it takes only 60 to 90 seconds of authentic recording. Once the voice is cloned, an audio file is generated from a simple text prompt in a process called text-to-speech.

In a variant of this voice cloning, a scammer creates a fake audio file by modifying an existing audio file to sound like someone else. This process is called speech-to-speech. This latter fake is a bit more convincing because with a human voice driving the fake, intonation and cadence tend to be more realistic.

Once the voice has been created, an original video is modified to make the celebrity’s mouth region move consistently with the new audio. Tools for both the voice cloning and video generation are now readily available online for free or for a nominal cost.

Although the resulting fakes are not (yet) perfect, they are reasonably convincing, particularly when being viewed on a small mobile screen. The genius — if you can call it that — of these types of fakes is that they can fail 99% of the time and still be highly lucrative for scam artists. More than any other nefarious use of generative AI, it is these types of frauds and scams that seem to have gained the most traction over the past few months.

Protecting consumers from AI-powered scams

These scams have not escaped the attention of the US government. In March of last year, the Federal Trade Commission (FTC) warned citizens about AI-enhanced scams. And more recently, the FTC announced a voice cloning challenge designed to encourage "the development of multidisciplinary approaches — from products to policies to procedures — aimed at protecting consumers from AI-enabled voice cloning harms, such as fraud and the broader misuse of biometric data and creative content. The goal of the challenge is to foster breakthrough ideas on preventing, monitoring, and evaluating malicious voice cloning."

The US Congress is paying attention, too. A bipartisan bill, the NO FAKES Act, would "prevent a person from producing or distributing an unauthorized AI-generated replica of an individual to perform in an audiovisual or sound recording without the consent of the individual being replicated."

Acknowledging that there may be legitimate uses of AI-powered impersonations, the Act has carve-outs for protected speech: "Exclusions are provided for the representation of an individual in works that are protected by the First Amendment, such as sports broadcasts, documentaries, biographical works, or for purposes of comment, criticism, or parody, among others." While the NO FAKES Act focuses on consent, Adobe’s proposed Federal Anti-Impersonation Right (the FAIR Act) provides a new mechanism for artists to protect their livelihoods while also protecting the evolution of creative style.

Looking ahead

Voice scams will come in many forms, from celebrity-powered scams on social media to highly personalized scams on your phone. The conventional wisdom of "If it seems too good to be true, it probably is" will go a long way toward protecting you online. In addition, for now at least, the videos often have telltale signs of AI-generation because there are typically several places where the audio and video appear de-synchronized, like a badly dubbed movie. Recognizing these flaws just requires slowing down and being a little more thoughtful before clicking, sharing, and liking.

Efforts are underway to add digital provenance or verifiable Content Credentials to audio. Respeecher, a voice-cloning marketplace gaining traction among creators and Hollywood studios, is adding Content Credentials to files generated with its tool.

For the more personalized attacks that will reach you on your phone in the form of a loved one saying they are in trouble and in need of cash, you and your family should agree on an easy-to-remember secret code word that can easily distinguish an authentic call from a scam.

Author bio: Professor Hany Farid is a world-renowned expert in the field of misinformation, disinformation, and digital forensics. He joined the Content Authenticity Initiative (CAI) as an advisor in June 2023. The CAI is an Adobe-led community of media and tech companies, NGOs, academics, and others working to promote adoption of the open industry standard for content authenticity and provenance.

Professor Farid teaches at the University of California, Berkeley, with a joint appointment in electrical engineering and computer sciences at the School of Information. He’s also a member of the Berkeley Artificial Intelligence Lab, Berkeley Institute for Data Science, Center for Innovation in Vision and Optics, Development Engineering Program, and Vision Science Program, and he’s a senior faculty advisor for the Center for Long-Term Cybersecurity. His research focuses on digital forensics, forensic science, misinformation, image analysis, and human perception.

He received his undergraduate degree in computer science and applied mathematics from the University of Rochester in 1989, his M.S. in computer science from SUNY Albany, and his Ph.D. in computer science from the University of Pennsylvania in 1997. Following a two-year post-doctoral fellowship in brain and cognitive sciences at MIT, he joined the faculty at Dartmouth College in 1999 where he remained until 2019.

Professor Farid is the recipient of an Alfred P. Sloan Fellowship and a John Simon Guggenheim Fellowship, and he’s a fellow of the National Academy of Inventors