When your AI adopts poetic licence.

Nope, I’m not talking about my wayward teen years! I’m talking about hallucination in AI. It’s one of those topics that I have been giving thought to, so I was grateful to attend a session at the AWS NLP Conference 2023 yesterday discussing ways of mitigating its impact.

Hallucinations, is where an AI will provide surplus or additional information with conviction even though what it’s saying isn’t necessarily true. This behaviour centres more around the Generative side of AI, and it therefore affects when, where and how it gets used.

Generally Generative AI is used in the creation of new content based on the prompt it is given. The new content could be video, text or even audio based and a hallucination can turn up for a number of reasons. Sometimes when asked to fathom maths problems, or solve what we might think are simple ‘case of deduction’ situations, we’re faced with outcomes that are less truthy.

Here are some examples from the Wikipedia page:

“When CNBC asked ChatGPT for the lyrics to "Ballad of Dwight Fry", ChatGPT supplied invented lyrics rather than the actual lyrics.[18] Asked questions about New Brunswick, ChatGPT got many answers right but incorrectly classified Samantha Bee as a "person from New Brunswick".[19] Asked about astrophysical magnetic fields, ChatGPT incorrectly volunteered that "(strong) magnetic fields of black holes are generated by the extremely strong gravitational forces in their vicinity". (In reality, as a consequence of the no-hair theorem, a black hole without an accretion disk is believed to have no magnetic field.)[20] Fast Company asked ChatGPT to generate a news article on Tesla's last financial quarter; ChatGPT created a coherent article, but made up the financial numbers contained within.[4]”

https://en.wikipedia.org/wiki/Hallucination_(artificial_intelligence)

These examples highlight the way in which LLMs can ‘fabricate’ answers to sound more realistic. If we’re too early to accept the knowledge and wisdom from these tools as the new truth, we will ultimately begin to eradicate wisdom and knowledge from society and much like the loss of literature, music or other creative works, we will lose the broadness of language and truth of history.

We are moving so fast though, so as soon as something becomes an issue, someone is bound to be working on a mitigation. The same is the case with hallucinations, but, many of them aren’t as simple as just asking GPT a question.

Prompt Engineering is one way to help avoid these fabricated answers, think of these as constraints, something I’ve spoken about before.

Zero Shot, as with the above wikipedia excerpt, doesn't give much context or examples of expected outcomes to the model, thus it can provide answers that have a higher probability of hallucinations. If it doesn’t know the answer, like with the Ballad, it can fabricate it to sound reasonable. Using Zero Shot should be considered for much simpler tasks, like with the classification linked above. Although in this case it may not even be necessary to use a Long Language Model (LLM).

Few Shot, gives the model an example of what the expected result should look like in terms of the format of the answer. In this case the user is essentially constraining the model to work within the context it has been provided.

There are many other ways to improve the responses given by the model chosen, for more examples, there is an excellent resource on the Prompt Engineering Guide. You’ll see that as you iterate through prompts to get to better results, you begin to look at the more complex prompting models, such as Chain of Thought (CoT) and Tree of Thoughts (ToT) which compound the simpler methods for complex input.

At this point however, the above only really covers the native use of a model and looks at simpler ways to give context and limit results. The next section looks at more advanced ways to deal with Hallucinations.

Give the AI engine a role. When we give the AI a role, such as a help bot, a support service for the elderly, a children's teacher or a historian, we can set the tone of the response and therefore the scope of its ability to fabricate answers.

Finally another method is Retrieval Augmentation Generation (RAG). This is by far the most complex as it requires you to add your own data in order to limit results entirely to the context of your data, but still leverage the brilliance of an LLM. Using RAG also allows you to save time, instead of retraining models with your data, the supplementary data you provide via some kind of document index, can be updated independently, improving responses further.

In conclusion, it is imperative that organisations wishing to leverage the powers of LLMs understand some of the challenges the industry faces. Hallucinations are one of a few and it’s highly likely that more will be found since we’re so early in the evolution of LLMs. Mastering the choice of LLM, supporting technology and data such as in the case of RAG, and consistent and contextual prompting will help to create better results for customers and employees alike.

When your AI adopts poetic licence.

Recent Posts

Comments