Heading

What’s a Rich Text element?

The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.

Static and dynamic content editing

A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!

How to customize formatting for each rich text

Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of" nested selector system.

Resource

Tackling Hallucinations in LLMs with RAG

Download now for free

No items found.

The rise of Large Language Models (LLMs) like ChatGPT has unlocked a new capability for technology-focused organizations. It’s not just about transforming existing processes like customer service and FAQs - LLMs are creating new use cases and features that weren’t possible before.

Download it now.

Download “Tackling Hallucinations in LLMs with RAG” to Learn

Causes of LLM hallucinations
How RAG reduces errors with external data
The importance of quality, relevant data
Best practices for prompts and guardrails
Verifying LLM-generated answers with citations

What are hallucinations?

A hallucination in LLMs is any discrepancy between the expected output of an LLM and the actual output.

As with any new technology, bleeding edge companies are rushing to see where they can use them. On the surface, it’s a commendable action. Organizations don’t want to be caught flat-footed when these types of generational platform opportunities happen.

But with new capability comes new risks. And LLMs are no different. Chief among them is something called a “hallucination”, where an LLM like ChatGPT says incorrect information. That could be as simple as asking who the current president is, and it says “Abraham Lincoln.” That’s funny for a hobbyist, but dangerous for an enterprise company relying on the LLM’s accuracy.

We’ll go over a technique to address hallucinations called Retrieval-Augmented Generation, or RAG. This technique helps remove some risk of hallucinations in modern LLMs.

It’s no silver bullet, but if you’re considering building LLM solutions, and worried about safety, it should be a core requirement.

Causes of hallucinations

Hallucinations usually occur from inconsistencies in training data, or pushing the LLM beyond its limits.

If an LLM is trained on incorrect data, that will amplify when it’s answering questions. It has no way of knowing the data is incorrect, because it just gets really good at memorizing its training data. As far as it's concerned, that training data is its knowledge of the world.

LLMs also make things up when they don’t have up-to-date knowledge, too. Asking ChatGPT who the president-elect is in December of this year might be inaccurate (both based on training data, and election results).

There are two main types of hallucinations that LLMs produce.

Faithfulness hallucinations

Sometimes, the LLM can have all the right data, but come to the wrong answer.

If you give it the text of a document, and ask it to summarize the document, it’s not guaranteed to do it perfectly every time. It had the right context, but didn’t translate the content correctly.

Another great example is math. LLMs can’t do math - they’re predicting sequences, not performing calculations. So when you ask an LLM to do complicated math, it’s likely it won’t arrive at the correct answer.

These can be dangerous as well. Asking an LLM to summarize a policy for a customer means you have to rely on that LLM’s output.

Factual hallucinations

This is what organizations usually worry about. You ask ChatGPT who the first person to walk on the moon was, and it responds with Vladimir Putin.

But fabrications can happen here, too. Imagine asking your company’s LLM to describe the refund policy, and it makes up an answer that’s nowhere in the terms and conditions. Without giving the LLM access to these documents, it’s pulling from the only information it has; its training data. That’s not reliable for an organization.

Using Retrieval-Augmented Generation

A new technique for mitigating hallucinations called “Retrieval-Augmented Generation” is a promising way to address the concern of hallucination.

The concept is simple - give the LLM access to knowledge beyond its training data. When it gets a question, it can encode that question and compare it against the external database called a vector database, making it much easier to find a document with the closest answer.

Why can’t we use a traditional database?

Storing data in its raw form can be problematic for search. Searching for text data is easy enough if you know the exact phrase, like searching “2024 Toyota Rav4”.

But what if you want to ask a more nuanced question? Like “What’s the best trim option for a 2024 Toyota Rav4 when I’m budget-conscious but also concerned about having the best features?” You might get lucky and that exact question is the title of an article. But odds are, it isn’t. And traditional search engines usually match by exact phrase, meaning you won’t get great results.

A vector database makes it much easier to search like humans search. It uses an algorithm that computes the mathematical representation of the article, called a vector embedding.

From there, it’s straightforward to get this working for your LLM.

There are mathematical formulas that help find similar vectors. Source data is converted to vectors with thousands of dimensions which allows finding things that are similar - not equal.

We can ask a question, and get the five (for example) closest answers that match the content of the question.

So, a vector database becomes our "knowledge base", where original data from multiple sources is loaded. Then, when a user asks a question, the system:

Makes a decision on the most relevant kind of data to look for
Retrieves this data from the knowledge base
Uses an agent "answer writer" to synthesize a response using the data, with guidelines we set for this kind of question. The agent also includes a reference to the original piece of data it found.

That last part is crucial.

Just like Wikipedia with citations, we can have more confidence in our answer if the source is included. We can tell if the answer is hallucinated or real.

How to address hallucinations with RAG

If we give an LLM access to data outside its training set, we can unlock entire new capabilities while reducing the chance for error.

Start with using quality data

Any data science will tell you that data quality forms the bedrock for any successful AI project. RAG is no different. It doesn’t matter how well-designed your system or vector database are - garbage in equals garbage out.

What does data quality actually mean in the context of RAG?

Clean data - Articles should be free from errors and duplicates.
Structured data - The more consistent the format, the better performing your retrieval. Thousands of scraped articles won’t cut it.
Relevant - It’s tempting to throw everything in a vector database, but this will make performance worse. Include relevant data only.

That last point is important. Giving your LLM all the data you have won’t make its performance better. Imagine sifting through articles about ancient Rome when you’re trying to locate the best way to change the oil on a mid-2000s Mercedes. At best, you’re increasing retrieval time. At worst, you’re confusing the LLM and degrading its answers.

Similarly, consider the audience. If you’re building a system for teenagers, but all of your knowledge base documents are scientific papers, your similarity matching will be worse, because the tone and question structure will not match. This isn’t an unsolvable problem, but it does mean more care in the prompt and setup.

Take time to address your prompt, and tune it

LLMs are unpredictable. The same prompt in GPT-3.5 might behave differently (or worse) in GPT-4 or GPT-4o.

The best hedge against this is a bulletproof, finely-tuned prompt. Here’s a few characteristics of a dependable prompt:

Include examples. LLMs perform best when they have relevant examples and context to build off, and structure their responses. If they’re going to answer questions about refrigerators, include examples about how a response should be structured, key points to include, and information about the customer.
Give context to your RAG data. Explain a dataset so the LLM knows which repository might be useful for which question. As you add more data sources, it’s helpful for the LLM to know which one to look to.
Don’t overload the prompt. It’s tempting to give it all the context in the world, but your text window is only so large. The more context, instructions, and examples you give, the less room you have for new information. Strike a balance between the two.
Add guardrails. Make sure the LLM has instructions on what to talk about, what not to talk about, and what to do if a user tries to go off the rails.
Adjust the user’s question to fit the database. A simple instruction for the LLM can have a big impact. If it has knowledge of the general structure of data in the vector database, it can tweak the user’s question to improve the chances of finding the right information.

Prompt adjustment is an often-overlooked aspect of RAG. Simply telling an LLM that it has access to an external database isn’t useful if the LLM doesn’t have context on what the database is used for, when to access it, what the data looks like, etc.

Use LLMs to check if the answer is correct and in your knowledge base

We’ve covered the beginning and middle of the diagram - setting up for success with quality data, and structuring the retrieval correctly with focused prompts.

But once we retrieve the information, we need to make sure it’s correct and useful.

There’s a few ways we can do this.

Use LLMs to evaluate the answer. A “grader” agent can analyze the user’s question, what RAG returned, and check to make sure the question is actually answered. This agentic approach also helps offload some context - we don’t have to make the question LLM also the answer-grading LLM, saving us valuable space in the context window.
Identify and test correct answers. The system can use the same grader approach to ensure that the answer is present in the knowledge base. If it isn’t, the system can prompt for a retry. Correctness looks a lot easier this way. The LLM is checking to make sure the information from the vector database is actually in the answer the user gets.
Include citations. Adding a step to include citations as part of the retrieval process can improve results. This can range from including the reference document, to addressing specific quotes and figures. This works well in tandem with an agent LLM - it can easily reference these documents and see if the citations are correct.

This is an evolving practice, but still underrated way to improve reliability of your system.

It can scale as simple or as complex as needed - from simply running through a GPT-powered grading step, to referencing the original database and keeping track of hallucinations.

What else do I have to look out for?

This is just the tip of the iceberg. Introducing RAG into your product’s architecture means new opportunities, but also new problems. Prompt injection, data integrity and security, system performance, cost…all points to consider.

If you’re confused where to start, we’ve built enterprise-level solutions for Fortune 500 clients. Our team is ready to understand your specific use case, and build out a custom system around it.

Reach out to learn more.

Download “Tackling Hallucinations in LLMs with RAG” to Learn

Causes of LLM hallucinations
How RAG reduces errors with external data
The importance of quality, relevant data
Best practices for prompts and guardrails
Verifying LLM-generated answers with citations

What are hallucinations?

A hallucination in LLMs is any discrepancy between the expected output of an LLM and the actual output.

We’ll go over a technique to address hallucinations called Retrieval-Augmented Generation, or RAG. This technique helps remove some risk of hallucinations in modern LLMs.

It’s no silver bullet, but if you’re considering building LLM solutions, and worried about safety, it should be a core requirement.

Causes of hallucinations

Hallucinations usually occur from inconsistencies in training data, or pushing the LLM beyond its limits.

There are two main types of hallucinations that LLMs produce.

Faithfulness hallucinations

Sometimes, the LLM can have all the right data, but come to the wrong answer.

These can be dangerous as well. Asking an LLM to summarize a policy for a customer means you have to rely on that LLM’s output.

Factual hallucinations

This is what organizations usually worry about. You ask ChatGPT who the first person to walk on the moon was, and it responds with Vladimir Putin.

Using Retrieval-Augmented Generation

A new technique for mitigating hallucinations called “Retrieval-Augmented Generation” is a promising way to address the concern of hallucination.

Why can’t we use a traditional database?

Storing data in its raw form can be problematic for search. Searching for text data is easy enough if you know the exact phrase, like searching “2024 Toyota Rav4”.

A vector database makes it much easier to search like humans search. It uses an algorithm that computes the mathematical representation of the article, called a vector embedding.

From there, it’s straightforward to get this working for your LLM.

There are mathematical formulas that help find similar vectors. Source data is converted to vectors with thousands of dimensions which allows finding things that are similar - not equal.

We can ask a question, and get the five (for example) closest answers that match the content of the question.

So, a vector database becomes our "knowledge base", where original data from multiple sources is loaded. Then, when a user asks a question, the system:

Makes a decision on the most relevant kind of data to look for
Retrieves this data from the knowledge base
Uses an agent "answer writer" to synthesize a response using the data, with guidelines we set for this kind of question. The agent also includes a reference to the original piece of data it found.

That last part is crucial.

Just like Wikipedia with citations, we can have more confidence in our answer if the source is included. We can tell if the answer is hallucinated or real.

How to address hallucinations with RAG

If we give an LLM access to data outside its training set, we can unlock entire new capabilities while reducing the chance for error.

Start with using quality data

What does data quality actually mean in the context of RAG?

Clean data - Articles should be free from errors and duplicates.
Structured data - The more consistent the format, the better performing your retrieval. Thousands of scraped articles won’t cut it.
Relevant - It’s tempting to throw everything in a vector database, but this will make performance worse. Include relevant data only.

Take time to address your prompt, and tune it

LLMs are unpredictable. The same prompt in GPT-3.5 might behave differently (or worse) in GPT-4 or GPT-4o.

The best hedge against this is a bulletproof, finely-tuned prompt. Here’s a few characteristics of a dependable prompt:

Include examples. LLMs perform best when they have relevant examples and context to build off, and structure their responses. If they’re going to answer questions about refrigerators, include examples about how a response should be structured, key points to include, and information about the customer.
Give context to your RAG data. Explain a dataset so the LLM knows which repository might be useful for which question. As you add more data sources, it’s helpful for the LLM to know which one to look to.
Don’t overload the prompt. It’s tempting to give it all the context in the world, but your text window is only so large. The more context, instructions, and examples you give, the less room you have for new information. Strike a balance between the two.
Add guardrails. Make sure the LLM has instructions on what to talk about, what not to talk about, and what to do if a user tries to go off the rails.
Adjust the user’s question to fit the database. A simple instruction for the LLM can have a big impact. If it has knowledge of the general structure of data in the vector database, it can tweak the user’s question to improve the chances of finding the right information.

Use LLMs to check if the answer is correct and in your knowledge base

We’ve covered the beginning and middle of the diagram - setting up for success with quality data, and structuring the retrieval correctly with focused prompts.

But once we retrieve the information, we need to make sure it’s correct and useful.

There’s a few ways we can do this.

Use LLMs to evaluate the answer. A “grader” agent can analyze the user’s question, what RAG returned, and check to make sure the question is actually answered. This agentic approach also helps offload some context - we don’t have to make the question LLM also the answer-grading LLM, saving us valuable space in the context window.
Identify and test correct answers. The system can use the same grader approach to ensure that the answer is present in the knowledge base. If it isn’t, the system can prompt for a retry. Correctness looks a lot easier this way. The LLM is checking to make sure the information from the vector database is actually in the answer the user gets.
Include citations. Adding a step to include citations as part of the retrieval process can improve results. This can range from including the reference document, to addressing specific quotes and figures. This works well in tandem with an agent LLM - it can easily reference these documents and see if the citations are correct.

This is an evolving practice, but still underrated way to improve reliability of your system.

It can scale as simple or as complex as needed - from simply running through a GPT-powered grading step, to referencing the original database and keeping track of hallucinations.

What else do I have to look out for?

If you’re confused where to start, we’ve built enterprise-level solutions for Fortune 500 clients. Our team is ready to understand your specific use case, and build out a custom system around it.

Reach out to learn more.

If you like this, download the full resource here.

Learn More

PDF This Page

View this Resource as a FlipBook For Free

Tackling Hallucinations in LLMs with RAG

Download Now For Free

‍

Have a Project?
‍Talk to the
Founders Directly

It's free, what do you have to lose?

Heading

What’s a Rich Text element?

Static and dynamic content editing

How to customize formatting for each rich text

Tackling Hallucinations in LLMs with RAG

Download “Tackling Hallucinations in LLMs with RAG” to Learn

What are hallucinations?

Causes of hallucinations

Faithfulness hallucinations

Factual hallucinations

Using Retrieval-Augmented Generation

Why can’t we use a traditional database?

How to address hallucinations with RAG

Start with using quality data

Take time to address your prompt, and tune it

Use LLMs to check if the answer is correct and in your knowledge base

What else do I have to look out for?

Download “Tackling Hallucinations in LLMs with RAG” to Learn

What are hallucinations?

Causes of hallucinations

Faithfulness hallucinations

Factual hallucinations

Using Retrieval-Augmented Generation

Why can’t we use a traditional database?

How to address hallucinations with RAG

Start with using quality data

Take time to address your prompt, and tune it

Use LLMs to check if the answer is correct and in your knowledge base

What else do I have to look out for?

Have a Project?‍Talk to theFounders Directly

Have a Project?
‍Talk to the
Founders Directly