The increasing sophistication of deepfake technologies and their potentially terrifying future applications were the subject of a recent webinar session by Cato Networks.
One of the presenters, Etay Maor, associate professor of cybersecurity at Boston College and senior director of security strategy at Cato Networks, began by outlining the various ways in which audio and video can be manipulated to spread misinformation.
1) Missing context
Here, the content isn’t edited in any way but rather is shown out of context and is misleading. Missing context comes in two forms — misrepresentation and isolation.
In misrepresentation, the information is presented inaccurately and misleadingly. An example offered by Maor was videos purporting to show one country attacking another, which were taken from a completely different conflict years earlier.
Isolation is when a brief clip is taken out of a longer video, removing the context and creating a false narrative of the events.
2) Deceptive editing
Maor highlighted two forms of deceptive editing. One is omission, in which various portions of a video are edited out, presenting the remainder as a complete narrative.
The other involves splicing videos together “to fundamentally alter the story that is being told.” These are commonly used for interviews, where the words spoken by the interviewee are moved elsewhere in the clip to create a different narrative.
3) Malicious transformation
Maor described this as “the most nefarious type” and what is commonly referred to as deepfakes. With malicious transformation, the content is “doctored” to deceive the viewer, for example, by altering the frames of a video. Additionally, there is fabrication, “which is using AI to create high-quality fake images and videos.”
Video Deepfakes
One of the earliest versions of deepfakes is “faceswap,” in which “pretty simple software” can be used to essentially place someone’s face on another’s in a video. “This can be done today on your phone; this is not super-computer, AI futuristic stuff,” commented Maor.
Raymond Lee, CEO of FakeNet.AI, noted that a number of apps are available today, enabling faceswaps to be undertaken relatively easily. However, these are currently not of the highest quality and only have limited application, most commonly for famous movie scenes. It is, therefore, difficult to use these for nefarious purposes at this time.
It is possible to create much more sophisticated faceswap videos, though. Lee showed a highly plausible video impersonating the actor Tom Cruise. This involved around two hours of training on a graphics processing unit (GPU) to accurately mimic Cruise’s facial nuances, as well as “days of professional video editing and post-processing.” Additionally, a skilled voice impersonator was used who had a similar shaped face and haircut to Cruise.
While the time, effort and money to create a video of this sophistication means there is currently a high barrier to entry, Lee noted, “that barrier to entry is getting lower and lower.”
The pair then gave another example involving the impersonation of former US President Barack Obama. Rather than his face being put on the face of the voice actor, as in the Tom Cruise scenario, it was just Obama’s mouth movements that were manipulated as the actor spoke.
Audio Deepfakes
Audio deepfakes can now be undertaken “in a few minutes,” according to Lee. To create these type of deepfakes, audio footage or files are taken and transcribed before the “most skilled learning models” are trained to replicate the unique characteristics of an individual’s voice.
He then showed an example of audio deepfakes, accurately replicating the voices of various US Presidents.
Lee explained that audio manipulation technologies come in two forms — deepfakes and cheapfakes. Deepfakes are more sophisticated, in which audio is modified or synthesized using AI, whereas cheapfakes use low-tech methods.
Current Use in Criminal Activity
While there is still plenty of room for improvement in deepfake technologies, the presenters outlined examples of ways these are already used to commit fraud and other types of crime.
"Deepfakes are becoming more realistic, they're becoming faster to make and they're becoming more accessible"
Lee stated: “Deepfakes are becoming more realistic, they’re becoming faster to make and they’re becoming more accessible. And that’s a problem because deepfakes are a powerful tool for weaponization.”
This particularly relates to audio deepfakes to create synthetic voices. In one example, around two years ago, fraudsters mimicked a company’s CEO using AI during a phone call, convincing an executive at the firm to wire $243,000 into a scam account.
In a child custody case, one parent created synthetic audio of the other, making threats to win the case. Thankfully, forensic experts were able to identify the voice as synthetic.
Future Use
Lee and Maor offered some potentially dystopian-style scenarios in which deepfakes could be used in the future. This includes nation-states spreading misinformation to further their geopolitical goals. For example, suppose one country wanted two others to engage in conflict with each other. In that case, they could create a deepfake video of a world leader saying something inflammatory that may provoke an attack.
Another scenario given was impersonating a high-profile figure in government or business and making false statements to cause a stock market crash. These kinds of uses led the FBI to issue an alert about the use deepfakes earlier this year, in which they highlighted its potential use by nation-state adversaries. It stated: “Malicious actors almost certainly will leverage synthetic content for cyber and foreign influence operations in the next 12-18 months. Foreign actors are currently using synthetic content in their influence campaigns, and the FBI anticipates it will be increasingly used by foreign and criminal cyber actors for spearphishing and social engineering in an evolution of cyber operational tradecraft.”
Another concerning outcome of enhancements in deepfake technologies could be the so-called “liar’s dividend.” As deepfakes become more convincing, it will become increasingly plausible for anyone caught on video or audio of wrongdoing to claim that the evidence has been faked. “Seeing is believing will soon become seeing is no longer believing,” explained Lee. “All the things that media has helped us confirm, that foundation is getting chipped away unless we do something to detect deepfakes.”
Detecting Deepfakes
Developing methods to detect deepfakes accurately is therefore essential. Lee and Maor said that available detection tools are able to see deepfakes with a 95% accuracy but warned constant improvements would be needed to stay ahead of evolving attackers.
High-level tools use machine learning to identify “semantically meaningful” features, such as unnatural head poses. Lee explained: “You can train a machine learning model to understand those types of movements, and when you understand those movements that are unique to individuals and compare it to a deepfake, the machine learning algorithm is able to detect the difference between the two.”
In contrast, low-level methods tend to uncover unnatural artifacts in faces that are not always perceptible to the human eye.
For most people who do not have access to these types of detection tools, some basic rules can be followed to avoid being duped by deepfakes. Primarily, this involves adopting a skeptical mindset and taking steps to satisfy yourself that the media is genuine. Lee advised, “If you do see a suspicious piece of media, verify the media source before sharing so identify who is actually sharing it — is it a reputable source? Then try to find some other evidence to confirm or to verify what you’re seeing.”
Deepfakes is a tactic that is growing in prominence, and before long, it will have the potential to cause serious societal issues. Developing detection technologies that stay ahead of malicious actors and educating people on not trusting everything they see online will be critical to ensuring this looming threat does not have disastrous outcomes.