Should Generative AIs Have Safeguards?

Romain Aviolat, CISO of Kudelski Security, shares his thoughts on Llama Guard and explains why, for cybersecurity reasons, companies using LLMs should simultaneously implement safeguards to prevent the generation of insecure or private content.(iStock)

by Camille Rustici

February 15, 202414 minsUpdated on February 16, 2024

With the development and increasing use of generative AIs, should safeguards also be developed? Last December, Meta launched Llama Guard, a security tool that classifies the safety of LLM prompts and responses, using a taxonomy of risks. Romain Aviolat, CISO of Kudelski Security, shares his thoughts on Llama Guard and more generally how such tools are important to protect LLMs and ensure their security, accuracy, and ethical behavior.

Kudelski Security, a division of the Kudelski Group, provides services covering the entire spectrum of cybersecurity (offensive security, incident response services, and consulting services around blockchain, AI, and quantum). They have a global presence in various industries, from healthcare to manufacturing, energy distribution, finance, and pharmaceuticals.

Romain Aviolat, who serves as the Chief Information Security Officer at Kudelski Security, provides insights into Llama Guard and underscores the importance for companies using LLMs to implement safeguards concurrently, particularly for cybersecurity purposes.

You have developed AI consulting services. What do they consist of?

Romain Aviolat: “We assist clients in developing safeguards for AI. This involves several aspects. Primarily, it encompasses advisory services, such as guiding clients on governance, risk management, architecture, and threat modeling for LLM applications.

It can also involve offensive security. We are seeing an increasing number of clients requesting assessments of their LLM security. This entails testing a typical web or mobile application, attempting to manipulate the prompt to extract information that should never be revealed by these LLMs.

This is the primary and classic use case, as prompt engineering is a prevalent vulnerability. However, it can also involve testing the model’s resilience, determining if the LLM can be overwhelmed by excessive requests, akin to testing a conventional application’s capabilities.”

To what do you attribute the rise in demand for offensive security services among your clients?

Romain Aviolat: “I believe it is directly linked to the widespread integration of generative AI across all businesses. Now, processes are being replaced, enhanced, or streamlined thanks to generative AI, raising numerous cybersecurity concerns.

In terms of risk for our clients, we anticipate seeing an increase in the quality and speed of cyberattacks, even full automation. This includes personalized attacks, phishing attempts, and the use of fake audio-video content for social engineering purposes. It’s not necessarily directly related to the security of the LLM itself but rather how LLMs can be utilized by threat actors.”

Is generative AI used for cyberattacks?

Romain Aviolat: “It’s challenging to determine whether recent cyberattacks are generally attributed to threat actors’ adeptness at adopting such technology before industries do. Thus, it’s safe to say that yes, it’s something they use and will increasingly utilize. However, quantifying its usage is difficult. It’s unclear if a cyberattack has been conducted using these systems. But we see improvements in the quality and volume of phishing attacks, suggesting a correlation with the proliferation of these systems.”

OWASP (Open Worldwide Application Security Project) has unveiled its top 10 language model vulnerabilities. How do safeguards like Llama Guard help protect against these risks?

Romain Aviolat: “OWASP was well-known for its top ten for web applications, so they created this top ten for language models last year as well to address the growing security concerns surrounding generative AI. OWASP’s top ten ranks vulnerabilities by their popularity. Prompt injection tops the list.

Without safeguards, we expose ourselves to the top ten vulnerabilities. Today, we know that many major vulnerabilities we’ve seen in the news, such as data leaks through language models, often occurred because we were able to manipulate the prompt of a language model. So clearly, positioning a safeguard at the input and output of language models, I think, is critical. It needs to be done.

However, in terms of limitations, we always say that in security, there’s no silver bullet. Llama Guard, for example, covers part of the work, only two vulnerabilities out of the top ten. It covers the input and output, so prompt injection and insecure output ending. It can also cover, to some extent, the sensitive information disclosure. Many things won’t be covered by Llama Guard, for example, denial of service, poisoning training models, the fact that models tend to hallucinate if they’re not properly framed and generate too much information or false information.

I think, as with traditional application security, no one solution will cover all risks. Meta specializes in analyzing what goes in and out of the language model. But other criteria come into play that it won’t be able to cover at all.

For example, number five in the top ten is supply chain vulnerability. These are vulnerabilities that are inherited from third-party code. Llama Guard doesn’t know your code; it just knows what goes in and out of the language model. It’s just a collection of vulnerabilities, and there isn’t a single tool that will fix everything.

If I may draw a parallel with a standard application that’s not a language model, it’s more or less the same thing. We’ll put a web application firewall in front of an application to control what goes into the application, to ensure we don’t have SQL injection and such. Well, it’s more or less the same thing. But the web application firewall won’t be able to fix problems with third-party libraries that have vulnerabilities in the code.”

OWASP’s Top 10 Vulnerabilities in Large Language Models (LLMs)

LLM01: Prompt Injection
Description: Manipulating LLMs via crafted inputs can lead to unauthorized access, data breaches, and compromised decision-making.

LLM02: Insecure Output Handling
Description: Neglecting to validate LLM outputs may lead to downstream security exploits, including code execution that compromises systems and exposes data.

LLM03: Training Data Poisoning
Description: Tampered training data can impair LLM models leading to responses that may compromise security, accuracy, or ethical behavior.

LLM04: Model Denial of Service
Description: Overloading LLMs with resource-heavy operations can cause service disruptions and increased costs.

LLM05: Supply Chain Vulnerabilities
Description: Depending upon compromised components, services, or datasets undermine system integrity, causing data breaches and system failures.

LLM06: Sensitive Information Disclosure
Description: Failure to protect against disclosure of sensitive information in LLM outputs can result in legal consequences or a loss of competitive advantage.

LLM07: Insecure Plugin Design
Description: LLM plugins processing untrusted inputs and having insufficient access control risk severe exploits like remote code execution.

LLM08: Excessive Agency
Description: Granting LLMs unchecked autonomy to take action can lead to unintended consequences, jeopardizing reliability, privacy, and trust.

LLM09: Overreliance
Description: Failing to critically assess LLM outputs can lead to compromised decision making, security vulnerabilities, and legal liabilities.

LLM10: Model Theft
Description: Unauthorized access to proprietary large language models risks theft, competitive advantage, and dissemination of sensitive information.

Could we have tools for each of these vulnerabilities?

Romain Aviolat: “We may not necessarily have tools, but we’ll have mitigations that we can implement. So for the ten mentioned, there are methods, things we can do to mitigate them. So some tools like Llama Guard will allow us to mitigate two out of ten. But if I take the training data poisoning part, which is number three, well, we can put controls in place on the training data to ensure that we don’t poison the model.

But, you know, it requires putting a process in place, and it’s not as simple as just putting Llama Guard at the input and output of our language model. So there are things that will be more costly. I think all can be mitigated, but then it’s a risk assessment, a cost-benefit analysis that needs to be taken into account for each of these vulnerabilities.”

What do you recommend to your clients using language models?

Romain Aviolat: “We’ll, of course, talk to them about having a safeguard that acts as a firewall around the language model to mitigate prompt injection and output problems. It could be Llama Guard, but it could also be a solution that cloud providers offer in addition to their language model services. So Llama Guard isn’t the only solution in this market.

What we’ll do is, like with a traditional application, we’ll conduct a threat modeling exercise to identify risks. We’ll perform an architecture review and try to understand in the context of the company or application where effort needs to be placed or where additional security controls need to be added. And maybe we’ll realize that the effort we’ve put into securing the input and output is sufficient and that the likelihood of other vulnerabilities materializing is too low for there to be a need to do anything. It’s about risk analysis and decision-making.

There is also the question of cost. Implementing safeguards can have a financial impact, but also an operational one. It might slow down the application. Llama Guard, for example, will typically slow down your language model because we’re adding one language model to control another. So there will be a financial cost and an operational cost in terms of user experience.”

Regarding the data breach that France experienced a few weeks ago, where 33 million French citizens had their social security data stolen, what was your reaction?

Romain Aviolat: “Well, am I surprised? Not at all. Is it new? No. Will there be more in the future? Yes, that’s for sure. France was hit hard by this attack, but the United States also suffered a lot the previous year. We also have major financial organizations, insurance companies, and investment funds that faced the same issues. We need to be more responsive, to invest our money better in protection systems, to be quicker in detection, prevention, and response. Cyberattacks are no longer a matter of hours, weeks, or months; they’re a matter of minutes. Our systems and the people who maintain them need to be much more responsive.

Perhaps artificial intelligence will also help with the response. We already see systems or solutions emerging in the market, co-pilots that can take responses on behalf of humans. Internally, we’ve already implemented these kinds of systems to automate responses. This doesn’t always involve AI; it’s often deterministic systems. But we believe that yes, generative AI will also assist the blue team.

Unfortunately, in reality, it’s often the human factor that is at stake. A simple identity theft or maybe an ill-adapted security control. So today, companies should be able to mitigate these stolen password issues. Technologies exist; we know how to protect ourselves against phishing once and for all. But implementing these technologies will take time, and in the meantime, we’ll have to deal with it.

That’s why I’m not overly optimistic for the future until we’ve improved this human aspect or given better security control to humans. Because for now, it remains a bit of a weak point in computer systems, unfortunately.”

Should Generative AIs Have Safeguards?

by Camille Rustici

You have developed AI consulting services. What do they consist of?

To what do you attribute the rise in demand for offensive security services among your clients?

Is generative AI used for cyberattacks?

OWASP (Open Worldwide Application Security Project) has unveiled its top 10 language model vulnerabilities. How do safeguards like Llama Guard help protect against these risks?

Could we have tools for each of these vulnerabilities?

What do you recommend to your clients using language models?

Regarding the data breach that France experienced a few weeks ago, where 33 million French citizens had their social security data stolen, what was your reaction?

Related articles

Full Order Books, Stretched Capacities: How to Rethink Ramp-Up in Aerospace & Defense

Agri-Robot Sets New Ploughing Record in Half Time a Driven Tractor Would Take

Racing to Replace the ISS: Four European & U.S. Startups Leading the Next Orbital Era

How 3D Printed Electronics Are Accelerating Quantum Sensor Innovation at the University of Stuttgart

China’s Forklift Manufacturers Are on the Rise: What Buyers Should Know

Innovation and Collaboration – OMRON Launches a New Facility in Stuttgart