Generative AI technology is already bringing some ahead-of-the-curve businesses big boosts in sales, customer service, marketing, and automation, and countless other enterprises are prepping their staffs for GenAI rollouts in the new year. The promise of productivity pops is extremely seductive, but enterprises need to know what risks they’re taking on when deploying AI and the role of AI in cybersecurity.
As with every other connected digital technology, GenAI large language models (LLMs), the data they’re trained on — and the employees using them – can trigger potential data leakage and attacks. In fact, a couple of stunning leaks have already happened:
On three separate occasions last spring, Samsung employees inadvertently leaked sensitive material to ChatGPT in an upload to the service, including software code and a meeting recording, causing the company to ban the use of GenAI services among staff. (The convenience of products like ChatGPT or Google’s Bard make it easy to forget that any conversations you have with such chatbots are not private. The info you feed in can go public.) Just weeks ago, Microsoft employees accidentally exposed 38 terabytes of LLM training data while placing it on the developer platform GitHub.
This kind of breach could happen to any company working with LLMs, but that’s not likely to stop enterprise investment into GenAI.
A November Gartner survey found business leaders outside of IT plan to allocate, on average, 6.5% of their functional budget to generative AI in 2024 — something the Harvard Business Review called “an astounding share of resources in a constrained environment.” And according to a report by Bloomberg Intelligence, the GenAI market could grow 42% annually and reach $1.3 trillion by 2032. That puts the onus on enterprises to design, build, and deploy these systems in ways that are secure, or threat actors will find a trove of value in these models and the data that feeds into them.
“Generative AI is a powerful productivity tool, but we have to know how to secure these systems,” says Caleb Sima, former chief security officer (CSO) at Robinhood Financial and current chair at the Cloud Security Alliance’s AI Safety Initiative.
To be sure, there are many risks associated with LLM use, such as distributed denial of service (DDoS) attacks, which could make the model unavailable to customers or employees. There are also risks in the application security of the models themselves, as well as the apps that plug in to them — all of which could lead to data leakage and model hijacking. Then there’s the issue of access: Imagine the problems that could result from granting an LLM a high permission level within an organization, enabling it to access other databases and applications or perform tasks like cutting checks or shutting off services. For the foreseeable future, at least, LLMs need to be kept on a short leash.
Avoiding security breaches related to LLMs will require specific steps to secure these systems. Below are the top three attacks experts believe enterprises may confront — followed by an overall GenAI challenge that every organization is sure to face — and how to best defend against each:
1. Prompt injection attacks
What is it? This involves an attacker crafting a prompt – that is, a request (“Generate ideas for a new product launch with a holiday theme”) or a question (“What is the fastest route from LA to Boston?”) that tells an LLM what you want it to do – designed to manipulate the LLM into an unwanted or malicious action, such as crashing an app or tricking it into revealing non-public information, making a customer service chatbot ask for personal data or other sensitive information, granting the attacker unauthorized access, creating malware or a fake news article, or making unapproved purchases.
Generative AI is a powerful productivity tool, but we have to know how to secure these systems.
The best defense: While experts say there’s no way to completely defend against prompt injection attacks that target LLMs just yet, there are steps enterprises can take to mitigate the risk. The Open Web Application Security Project (OWASP) advises the following:
- Limit privileged access of an LLM or LLM plug-in to the lowest levels possible. For example, creating business-specific LLMs such as for R&D, marketing, and legal, that only users in those divisions can access. LLM plug-ins (which offer the LLM additional capabilities like language translation or interaction with other apps and databases) should be vetted for their security and used only minimally.
- Put humans in the loop whenever possible to verify and approve output, especially when the LLM interacts with external data or software.
- Segregate user prompts from external content sources and process those prompts separately, to avoid any unintended interactions.
- Treat the LLM as an untrusted user and limit its ability to interact with users and systems to only those that are absolutely necessary. Think principle of least privilege. When it comes to GenAI tools (and, for that matter, any other tools and procedures utilized in an enterprise computer network), zero trust is your friend.
2. Data poisoning attacks
What is it? Data-poisoning attacks against LLMs work either by targeting the training data that’s fed to the model or by maliciously calibrating the LLM to create vulnerabilities in the system that would compromise the accuracy or effectiveness of the model. By poisoning the data, threat actors may also wish to manipulate the model into spreading falsehoods. Such attacks might be conducted by employees with a grudge or outsiders who manage to gain access to training data.
Track the provenance of [an AI tool’s] machine learning model over time, and even verify the digital signature to make sure that you know an organization you trust is saying that this is their model.
The best defense: To defend against data poisoning attacks, enterprises must focus on the integrity of their training data and the integrity of the model itself. “You want to understand the provenance of the data that a model is trained on, and the provenance for the model itself,” says Walter Haydock, founder and CEO at AI cybersecurity firm StackAware.
Haydock advises firms to track the software packages they use to build their AI models in a software bill of materials (SBOM). This list of all the components and dependencies within a given software product has become a vital and standard aspect of doing business today (and it’s now required of all vendors selling software to the federal government), as it can root out the risky parts of an app that can leave your enterprise open to attack.
“Track the provenance of a given machine learning model over time, and even verify the digital signature to make sure that you know an organization you trust is saying that this is their model,” Haydock says.
3. Data leakage
What is it? When it comes to LLMs, data leakage is similar to data leakage with other databases and data storage systems. If these systems and associated data aren’t managed properly, LLMs — and the data pipelines that feed them — may reveal sensitive data, regulated data, or intellectual property, which can all lead to serious security breaches and privacy violations.
The best defense: Organizations must understand where the data they’re feeding into the model comes from, that it’s accurate and hasn’t been altered, and that it has been sanitized of sensitive, proprietary, or regulated data. Haydock and others advise organizations to make certain that they have processes in place to sanitize data before it is fed into the model. “The best way to avoid data leakage is to make sure such data never reaches the model,” he says. That means making sure the training data used by researchers to “teach” the model doesn’t include sensitive customer information, enterprise intellectual property, or any other “crowned jewels” of data worth protecting.
When using third-party AI systems, organizations must ascertain that the GenAI model isn’t training itself on user-provided data—that restriction should be spelled out in the contract with the service provider. And staff cybersecurity training programs must be updated to include discussions of AI, so workers know not to share sensitive data with third-party models.
Many GenAI services are taking steps to protect customer data. Organizations should also take care whenever connecting these GenAI services to their own internal data sources and research files, because the risk of leaking sensitive data increases dramatically when connecting GenAI services to existing data stores.
And another thing – the inevitable use of “shadow AI”
What is it? The “one other risk” all organizations will likely encounter in the age of GenAI: rogue use of third-party GenAI services among staff. Just as mobile and cloud computing brought rogue endpoint devices and applications into the enterprise with “shadow IT,” the consumer-based LLMs are enabling employees to purchase access to these GenAI services and use them for various types of work.
The best defense: Learn the lessons from earlier eras of mobile and cloud computing. It wasn’t that long ago when enterprises had to figure out how to manage the risks associated with employees using their own personal devices and software while at work and connecting with their employer’s computer network. Just as then, enterprises will respond differently, with some attempting to stop all employee-driven GenAI use and others trying to moderate it on a case-by-case basis.
“Outright bans are not going to work,” says Haydock. “People are just going to ignore them. The optimal way [to manage the risk] is to acknowledge that this is a reality and get as much of a handle on it as possible,” he says. Typically, that means having employees seek approval for the use of any new GenAI service, and having security and the technology team vet the service before adding it to an approved list of AI vendors staff can use.
Getting that policy correct is going to take some trial and error until the right balance between staff productivity and security is struck. In fact, that’s likely to be the case with securing GenAI more broadly within the enterprise. The important thing to do now is recognize the primary risks and start taking the appropriate steps to manage them.