Close the backdoor: understanding rapid injection and minimizing risk

10 Min Read

Join us as we return to New York on June 5 to work with executive leaders to explore comprehensive methods for auditing AI models for bias, performance, and ethical compliance in diverse organizations. Find out how you can attend here.

New technology means new opportunities… but also new threats. And when the technology is as complex and unfamiliar as generative AI, it can be difficult to understand which is which.

Take the discussion surrounding hallucination. In the early days of the AI ​​rush, many people were convinced that hallucination was always an unwanted and potentially harmful behavior, something that needed to be eradicated completely. Then the conversation changed and the idea that hallucination could be valuable was introduced.

Isa Fulford from Open AI expresses this well. “We probably don’t want models that never hallucinate, because you can see it as a creative model,” she emphasizes. “We just want models that hallucinate in the right context. In some contexts it is okay to hallucinate (for example, when asking for help with creative writing or new creative ways to approach a problem), while in other cases it is not.”

This view is now the dominant view regarding hallucination. And now there’s a new concept that’s gaining popularity and causing a lot of fear: “Rapid injection.” This is generally defined as when users intentionally abuse or exploit an AI solution to create an undesirable outcome. And unlike most of the discussion about potential bad outcomes of AI, which often focuses on potential negative outcomes for users, this is about risks for AI providers.

Let me explain why I think much of the hype and fear surrounding rapid injection is overblown, but that doesn’t mean there isn’t a real risk. A quick shot should serve as a reminder that when it comes to AI, the risks go both ways. If you want to build LLMs that keep your users, your business, and your reputation safe, you need to understand what this is and how to mitigate it.

How rapid injection works

You can see this as the downside of the incredible, groundbreaking openness and flexibility of the AI ​​generation. When AI agents are designed and executed well, it truly feels like they can do anything. It can feel like magic: I just tell him what I want and he just does it!

The problem, of course, is that responsible companies don’t want to bring an AI into the world that really ‘does everything’. And unlike traditional software solutions, which often have rigid user interfaces, large language models (LLMs) provide opportunistic and ill-intentioned users with ample opportunities to test their limits.

You don’t have to be an expert hacker to try to exploit an AI agent; you can just try different prompts and see how the system responds. Some of the simplest forms of rapid injection are when users try to convince the AI ​​to bypass content restrictions or ignore controls. This is called ‘jailbreaking’. One of the most famous examples of this came back in 2016, when Microsoft released a prototype Twitter bot that quickly “learned” how utter racist and sexist comments. More recently, Microsoft Bing (now “Microsoft Co-Pilot) was successfully manipulated to give away confidential data about its construction.

Other threats include data extraction, where users try to trick the AI ​​into revealing confidential information. Imagine an AI banking support agent convinced to provide customers’ sensitive financial information, or an HR bot sharing employee payroll data.

See also  Google Pixel Fold 2 Release Date, Price and Specs Rumors

And as AI is asked to play an increasingly important role in customer service and sales functions, a new challenge arises. Users may be able to convince the AI ​​to give huge discounts or inappropriate refunds. Recently a dealer bot “sold” a 2024 Chevrolet Tahoe for $1 to one creative and persistent user.

How to protect your organization

Nowadays there are entire forums where people share tips for avoiding the guardrails surrounding AI. It’s a kind of arms race; exploits emerge, are shared online, and are then usually quickly shut down by the public LLMs. The challenge of catching up is a lot harder for other bot owners and operators.

There is no way to avoid all risks of AI abuse. Think of prompt injection as a backdoor built into every AI system that enables user prompts. You can’t completely secure the door, but you can make it much harder to open. Here are the things you need to do now to minimize the chance of a bad outcome.

Set up the right terms of use to protect yourself

Legal terms on their own obviously don’t provide protection, but it’s still critical that you have them in place. Your terms of use should be clear, comprehensive, and relevant to the specific nature of your solution. Don’t skip this! Make sure you enforce user acceptance.

Limit the data and actions available to the user

The surest solution to minimize risk is to limit what is accessible to only what is necessary. If the agent has access to data or tools, it is at least possible that the user can find a way to trick the system into making it available. This is the principle of least privilege: It’s always been a good design principle, but it becomes absolutely essential with AI.

Use evaluation frameworks

Frameworks and solutions exist to help you test how your LLM system responds to different inputs. It is important to do this before making your agent available, but also to monitor this on an ongoing basis.

See also  My health information has been stolen. What now?

This allows you to test for certain vulnerabilities. They essentially simulate fast injection behavior, allowing you to understand and close any vulnerabilities. The goal is to block the threat… or at least monitor it.

Familiar threats in a new context

These suggestions on how to protect yourself may sound familiar: For many of you with a technology background, the danger of rapid injection is reminiscent of the danger of running apps in a browser. While the context and some specifics are unique to AI, the challenge of avoiding exploits and blocking the extraction of code and data is similar.

Yes, LLMs are new and somewhat unknown, but we have the techniques and practices in place to protect against these types of threats. We just have to apply them properly in a new context.

Remember: this isn’t just about blocking master hackers. Sometimes it’s just about stopping obvious challenges (many “exploits” are simply users asking for the same thing over and over again!).

It is also important to avoid the pitfall of blaming rapid injection for unexpected and unwanted LLM behavior. It’s not always the users’ fault. Remember: LLMs demonstrate the ability to reason and solve problems, and to express creativity. So when users ask the LLM to accomplish something, the solution looks at everything available (data and tools) to fulfill the request. The results may seem surprising or even problematic, but chances are they’re coming from your own system.

The essence of rapid injection is this: take it seriously and minimize the risk, but don’t let it hold you back.

Cai GoGwilt is the co-founder and chief architect of Rock solid.

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *