Improving Protect Line Customer Experience With an AI Chatbot

NineTwoThree developed an AI chatbot for Protect Line to enhance customer experience and sales through personalized conversations.

Concept

Protect Line Ltd., one of the UK’s largest insurance companies, faced a problem with their sales process: a significant amount of time was devoted to leads that did not convert. The experience for potential customers was also broken, where customers did not want to engage in lengthy conversations with sales. Protect Line wanted to explore an AI chatbot that could improve the experience for both its own team and prospective customers.

Challenges

Creating an AI chatbot with the level of sophistication Protect Line needed for production involved numerous challenges with Large Language Models (LLMs). When NineTwoThree began to work on the project, our engineers flagged several challenges that we would need to solve to move forward.

Designing a Chatbot that Acts Like a Human

The chatbot we were creating had a series of 17 questions to ask prospective customers, but firing them all at once would overwhelm a prospective customer. The NineTwoThree team had to find a conversational way to ask the questions while being able to maintain data quality. If a potential customer gave a vague answer, the chatbot had to be able to clarify the answer to get all the necessary information.

In addition to getting answers to its questions, the chatbot needed to answer like a human Protect Line agent would instead of a generic chatbot. Lastly, the chatbot needed to drive the conversation in a way that would make the prospective customer want to talk to a salesperson.

Chatbot Controls

Since Protect Line is in a regulated industry, the chatbot also had to be in compliance with these regulations. It had to use non-advisory language and couldn’t actually recommend products. It also had to be able to handle distressed potential customers that may have mental health challenges such as depression. The chatbot needed to know when to cut the conversation short and pass the conversation to a human.

Passing High-Quality Leads to Sales

NineTwoThree explored whether it could score leads based on how a potential customer’s questions were similar to Protect Line’s best leads. In order for scoring to work, the chatbot had to be able to get enough information from the prospective customer.

Solution

The NineTwoThree team overcame the challenges in our proof of concept app for Protect Line using non-standard LLM methods. Our team also had to optimize the models through rigorous testing, analysis, and fine-tuning to enhance the solution's accuracy.

Making the Chatbot Conversational

The chatbot experience is more of a quick quiz for the proof of concept stage. The NineTwoThree engineering team created a “question/answer mode” where the prospective customer can learn more about what the chatbot is asking and then return to correctly answer the question.

Making Sure the Chatbot Gives Correct Information

While pre-trained by vendors LLMs perform relatively well on general topics, they lack knowledge in more specific domains, hallucinate, and have knowledge cutoff based on pre-training date. We implemented a retrieval augmented generation (RAG) backed by Vector Databases using FAQ and help articles as the main source of information. By combining retrieval mechanisms with state-of-the-art pre-trained language model generation capabilities, the system offers a more accurate and reliable solution for generating content and interacting with users.

Creating Custom Prompts

Single LLMs have limitations in context window size and attention span. As a result, they can’t do multiple tasks with the right quality. We moved away from relying on a single LLM and created a unique combination of LLMs under the hood to achieve the best performance cost ratio. This method included us creating prompts from scratch and validating and fine-tuning them for:

Holding Conversations

concluding conversations

qualifying call prompts

analyzing call prompts based on user engagement

question answering using retrieval augmented generation

evaluating customer relevancy

checking the validity of provided user data

summarizing conversations

Passing the Conversation to Humans

To pass information back to a sales representative, we created a list of 50 keywords that would require the chatbot to immediately pass the prospective customer to a human. This methodology ensures that anyone suffering from a mental health issue or having other troubles will be handled correctly with a trained person.

Making Sure the Chatbot Gives Correct Information

Chain of Thought Reasoning

Protect Line had 17 database questions it needed answers to before a sales representative could get on the call with the customer. To achieve this, we used a chain of thought reasoning using a ReACT bot. First the bot will take action in the environment with Langchan, and then it will get the observation back to the LLM. The LLM will then refine its reasoning, getting multi-step reasoning with multiple questions and answers. The bot would then continue this process until all 17 questions were answered.

Passing the Conversation to Humans

Scoring Leads

After the conversation with a user is finished, the system generates a report based on the lead’s engagement and quality. A lead’s engagement is measured by high, medium, and low, while the lead quality is measured by hot, warm and cold. The NineTwoThree team created a place for the Protect Line team to rate the conversation from their perspective to improve the chatbot.

In order to properly score leads, the NineTwoThree team developed an evaluation suite for in depth analysis. We created custom prompts for each of the following aspects of an LLM which include:

Conciseness: Evaluating if the responses are succinct yet informative.
Emotional intelligence: Assessing the chatbot's ability to recognize and respond appropriately to user emotions.
Coherence: Verifying the truthfulness and accuracy of the information provided in responses with the knowledge of ground truth.
Latency: Ensuring the chatbot responds within a reasonable timeframe
Response price and token usage: Making sure the chatbot’s price and token usage stayed within budget
Fluency: Verifying the chatbot speaks as naturally as possible for a positive user experience.

The NineTwoThree team worked iteratively on these prompts and ran multiple experiments to achieve the best results.