Using data science, cybersecurity researchers have released a new approach that makes automated phishing exploits almost as fruitful as typically more time-consuming spear-phishing methods.
Presenting at the Black Hat conference in Las Vegas, John Seymour, data scientist, and Philip Tully, senior data scientist, both with ZeroFOX, discussed how they used a combination of traditional natural language processing, histograms, and parsing information from user profiles to build a much more effective automated phishing campaign.
“Machine learning can be used offensively to automate spear phishing,” says Tully, as he presented the session “Weaponizing Data Science for Social Engineering: Automated E2E Spear Phishing”.
Phishing, and more targeted, time-consuming spear phishing, typically land at or near the top of any list of IT security concerns, Seymour points out, which is what drove these white-hat hackers to investigate possible new approaches in this area. The result of their research and development effort sis SNAP_R, which stands for the Social Network Automated Phishing with Reconnaisance tool.
“One of the first questions we get is always ‘why social media, and Twitter in particular, is such a good vector for spear phishing,’” Seymour says. Many attackers go after Twitter users, he says, because there’s a good API for scraping data in the social media site. Also, Twitter uses colloquial syntax in grammar, and the character limit is shortened, which helps with obfuscating payloads, he adds.
“There’s also a trusting culture. No one suspects their social networks of harboring negative content,” Tully says. “And there’s this idea of incentivized data disclosures, which makes people want to share their personal details about themselves.”
SNAP_R responds by taking posts from a Twitter user’s site and using their most frequently used words and interests to customize fraudulent posts based on their timeline. It starts by prepending tweets the user posts. The tool also shortens the payload for the user, and triages user-targets based on their relative value and engagement on the platform. “It is designed to only target people who are likely to click on links based on their value and engagement,” says Tully.
Exploits that use a SNAP_R model would also rely on building a believable profile, with a mix of non-phishing posts. The tool also reviews the timeline of postings, so it will relay its phishing posts to target-users at the peak time they tend to be online and tweeting. The tool also utilizes two separate models – Markov and LSTM (long short term memory) – to pre-train the SNAP_R tool, in looking at word-use frequency, and determining verified accounts as opposed to bots.
In the end, SNAP_R promises to provide a 30% to 35% click-through rate – not as good, as slower, more personally customized spear phishing schemes that are 45% successful, but decidedly better than the 5% to 14% similarly automated phishing campaigns draw.