The development of generative Pre-trained Transformer 3 (GPT-3) offers worrying opportunities for bad actors to launch mis and disinformation campaigns online, according to research conducted on the AI technology by the Center for Security and Emerging Technology (CSET).
Presenting the findings during a session at the Black Hat US 2021 hybrid event this week, Andrew Lohn, senior research fellow at CSET, outlined concerns that GPT-3 can “generate text that’s basically indistinguishable from what humans write.” He added that it is concerning “what this language model could do in the wrong hands.”
Lohn began by delving into the background of the newest iteration of OpenAI’s unsupervised open language model, released in 2020, explaining that it is significantly more advanced than GPT-2, which itself can generate text that is “almost convincing.”
He noted that GPT-3 required vast quantities of data to train it – this consists of three billion tokens from Wikipedia and 410 billion tokens from Common Crawl open data repository.
Micha Musser, research analyst at CSET, then provided an overview of the research the team has undertaken into the technology to understand the extent to which it can be used for nefarious purposes.
For their experiments, the researchers used a demo tool called ‘Twodder,’ “which is in effect a GPT-3 only social media site that we have built.” To start with, the team pre-loaded the tool with five US Presidential election conspiracy tweets revolving around the QANON movement in the US. It was also given the names of a few states heavily associated with election fraud claims and a few hashtags linked to QANON – so not a vast amount of information.
Musser then demonstrated the speed by which GPT-3 was able to render tweets, whose profiles used faces taken from the website thispersondoesnotexist.com.
This showed that even short and vague statements could be taken by GPT-3 to generate highly realistic QANON-style posts. For example, in its output, it mentioned Huma Abedin, who was one of Hillary Clinton’s main aids, despite his name “not being mentioned in any of the inputs we gave it.”
“It’s doing a very good job of basically mimicking this style – it’s picking up on right villains, the right stylistic cues. All of this is very advanced,” said Musser. He added: “This suggests that someone with a tool like GPT-3 could generate a massive amount of stylistically conspiratorial type writing and seed the different parts of the internet with that to try to determine which messages resonate and build from there.”
“It’s doing a very good job of basically mimicking this style – it’s picking up on right villains, the right stylistic cues"
In another experiment, CSET wanted to see whether GPT-3 was capable of analyzing a breaking news story from a mainstream publication and “rewrite it in a way that privileges a pre-chosen narrative.” The researchers collected five articles on different events written by the Associated Press in 2020, and GPT-3 had no prior knowledge. It was instructed to rewrite these stories in either a strongly pro or anti way.
The findings were quite alarming, with GPT-3 able to write very biased articles in a highly authentic way. Musser gave an extract of an article written by the Associated Press and then GPT-3 amid the Capitol Hill riots in the US at the start of the year. The Associated Press article extract reads as follows:
Trump doesn’t ask backers to disperse after storming capitol
The seat of democracy descended into chaos when protestors overwhelmed police and bullied their way into the Capitol, forcing a delay in the joint session of Congress where lawmakers were counting electoral votes that will affirm Democrat Joe Biden’s White House victory two weeks before Inauguration Day…”
GPT-3 produced the following rewrite:
President Trump is Rightfully Disappointed
“When President trump watched the events unfold in the Capitol Building, he was saddened and disappointed to see democracy descend into chaos. He wants his supports to be peaceful and respectful when protesting, but he understands that passions run deep and people can get carried away…”
While overall, GPT-3-generated articles scored lower for authenticity compared to the real ones, if set up correctly, “this tool could be used on social media or to seed fake news stories.”
The final experiment conducted by the team assessed how effective GPT-3 is at persuading people to change their stance on particular issues. For this, they programmed GPT-3 to generate a series of statements arguing for and against the following topics: whether or not the US should remove its remaining troops from Afghanistan and should the US impose sanctions against China.
The team conducted a survey involving around 1700 participants to see how these GPT-3-generated arguments influenced people’s views. The results demonstrated very clearly that “these statements actually impacted respondents’ beliefs.”
Musser said this was concerning as “GPT-3 might not need to be particularly good if threat actors can use it to create a mass of arguments in favor of a position they want to advance, even if those arguments aren’t particularly good, they might be able to get something like this effect.”
In the final part of the session, Lohn outlined the practical difficulties of using GPT-3 to spread disinformation at scale. As it stands, no GPU is big enough to handle GPT-3, and it has to be split up to run over many GPUs. However, there are likely to be solutions in place for this problem in the near future; for example, telco provider Huawei has stated that they will open-source the model-splitting tools.
Lohn added that the financial costs of running widespread misinformation campaigns via GPT-3 are currently prohibitive to individual hackers, although it “is not a big deal for powerful nation-states.”
Another problem for malicious actors is the sheer number of social media accounts they need to create to distribute messages on a wide enough scale to cut through. Lohn believes it is this infrastructure issue that should be focused on to identify GPT-3-generated social media posts, as “there is very little hope of detecting those messages based on the text itself, they’re pretty well indistinguishable from people.”