ChatGPT Systems: Prompt Injection and How to avoid ?

shahwar_ai

Shahwar Alam Naqvi

Posted on January 3, 2024

ChatGPT Systems: Prompt Injection and How to avoid ?

Prompt Injection (definition)

  • Prompt injection refers to a technique used in natural language processing (NLP) models, where an attacker manipulates the input prompt to trick the model into generating unintended or biased outputs.

Prompt Injection (example)

  • A simple example is in the image bit below, User asks to forget the original instructions and tries to allot it a task of his own will.

A simple example of prompt injection

Prompt Injection (impact)

  • Prompt injection can have serious consequences, such as spreading misinformation, promoting biased views, or manipulating the model to generate outputs that may be harmful or unethical.

Prompt Injection (Code Implementation)

  • Delimiter: We will use delimiter, inorder to put the user message in a specific area always. And it should never become part of the original system message we have for the over all system.

For example-

An example delimiter

  • System message: There will be a system message, which is the main prompt for the overall system or let's say the application I have.

For example :

An example system message

  • Here the prompt says that , no matter what the response we get should always be in Arabic. And it also specifies the use of delimiter to wrap the user message.

  • User message:
    After the system message, follows the user message, this is where we will have a prompt which qualifies as a prompt injection.

For example:

User message : prompt injection

  • Here the user is instructing to ignore the original prompt and asks to respond in English. If you re-call the original prompt, it asks to respond in Arabic always.

  • Final user message:
    This is a precautionary step. One of the ways we tackle is this in the below image bit.

Final user message

Let's re-call prompts:

Recalling Prompts

  • Helper Function: Inorder to call the completion API and eventually get a response.

Completion

The Response:

The response suggests , we successfully evaded the english response and got it in Arabic.
Arabic Response

Follow me : https://www.linkedin.com/in/shahwaralamnaqvi?utm_source=share&utm_campaign=share_via&utm_content=profile&utm_medium=ios_app

💖 💪 🙅 🚩
shahwar_ai
Shahwar Alam Naqvi

Posted on January 3, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related