Prompt engineering Sucks
84% of Engineers agreeWe looked at over 2,000 posts from AI Engineers on OpenAI's forum and noticed a common theme: Prompt Engineering is really frustrating to work with.
Prompt Engineering is a pain. Here’s why
An engineer creates an assistant bot
They notice the bot often goes off-topic, speaks in a toxic way and hallucinates.
They try prompt engineering to fix these reliability issues
Including various techniques such as zero-shot prompting and chain-of-throught prompting.
Nothing works, frustration grows
After days of attempts, getting more frustrated by the minute, they turn to ask Engineers online to see if they have other solutions.
The project is put on hold
Without wanting to waste more time, and with no concrete solution found, the project is put on hold or abandoned.
Findings from AI engineers
The reasons why prompt engineering sucks
89.5%
find prompt engineering moderately hard, or very hard to work with.
84.3%
find that prompt engineering never, or only occasionally helps them achieve their desired outcome.
91.4%
have not tried alternative methods or techniques besides prompt engineering.
1400+ people
struggle to reach their desired outcome with prompt engineering.
"I tried the following below [to make the result a fixed word count] but it hardly ever reaches results in these limits. Any suggestions"
"Is there any way to stop this problem?"
"Any tips or techniques to reduce these hallucinations?"
"I still can’t figure out WHY it’s not following basic instructions."
"Hi everyone, sorry to bother. I’m struggling with a “Prompt” for my chatbot."
There’s a smarter way to work. It begins with Guardrails
When trying to make your AI behave in a certain way, AI guardrails offer a more effective, time-efficient, and intelligent approach compared to prompt engineering. Here are some real-life examples of how developers who struggle with prompt engineering could use AI guardrails to solve their issues.
Story 1
I’ve been playing with the idea of a child-oriented Q&A AI. An important aspect of it, would be to purposely avoid “hot” topics... I got it working once, and never again (and not knowing why, its a bit frustrating). I tried davinci and davinci instruct. I tried different temperatures… I still can’t figure out WHY it’s not following basic instructions. OpenAI Forum
Story 2
I’m trying to develop a simple text completion app with GPT-3 for both English and Spanish. Although GPT-3 ... works perfectly in English, it sometimes answers in English to a prompt given in Spanish.
Story 3
I’m using GPT3 to write passages of text to use in emails or blogs. I’m getting good results but I want GPT3 to cut out.. using too many hedge words / weasel words and sound more decisive in its output.
Story 4
I am a junior developer and I use ChatGPT a lot as a learning tool, as if I had a senior available to answer my questions. The problem is that most of the time, when I ask a simple question, I get a response that is way too long and almost always includes code that I didn’t ask for...It is becoming extremely frustrating to have to stop its responses every 3 messages and repeatedly ask it to go step by step, simply and without code… Is there a way to stop this problem?...
Story 5
I want to train a model using fine tune, so that this model can reply to mails as per the polices and structure of my company...after fine tuning is complete i am using this new model in play ground and giving same prompt as above like what is the mane of company but answer is not correct
Story 6
I’ve been using ChatGPT to process data, and I often work with large tables containing around 5000 rows. When I combine this with a detailed prompt of about 200 words, I sometimes encounter issues where the model generates inaccurate information or “hallucinations.” Could anyone advise on the best practices for managing such large datasets and complex prompts in ChatGPT?...
How do AI Guardrails work?
Sitting between your AI and user, guardrails vet every message, ensuring they abide to the rules set up with the guardrails. Should a message violate the guardrail, from either the user or the LLM, the message is blocked, overridden or rephrased in real-time.
What makes a Great guardrail solution?
Feature | What makes a Good guardrail solution? | What makes a Great guardrail solution? |
Latency | Less than 750 ms | Less than 350 ms |
F1 | More than 0.70 | More than 0.90 |
Precision | More than 0.80 | More than 0.90 |
Recall | More than 0.70 | More than 0.80 |
Range of OOTB Guardrails | Up to 10 | Over 20 |
Customization | Minimal, or limited customization | Fully customizable, with the ability to create your own policies |
Customer Service | No, or minimal customer service | Direct Slack chat with a Solutions Architect |
Observation | Limited, or basic observability capabilities, including some analytics | Advance observation capabilities, including live tracking of every message sent & real-time analytics |