Next Generation Conversational AI

NineTwoThree was selected by the CR Innovation Lab to help build an experimental chatbot that combines the power of AI with CR's expertise to answer your questions and offer product recommendations. NineTwoThree helped design and implement the system alongside CR’s engineering and product team.

Concept

Imagine shopping for a mattress and being able to give an AI chatbot your preferences such as: “I’m a tall person who sleeps on my back, what is a good mattress for me?” and getting a personalized recommendation.

For over 80 years, CR has earned the reputation of delivering unbiased and well researched product information to consumers. CR is well-positioned to deliver a powerful AI chatbot that can help consumers get their product questions answered with reliable, unbiased information.

CR’s innovation team was quick to move towards building an AI-powered tool that can help people doing product research get to the right answers faster. They hired NineTwoThree as their development partner because of our engineers’ proven AI/ML expertise launching similar systems to hundreds of thousands of users in production, as well as our experience working with innovation labs from both public and non-profit companies.

The goal of this 7 month project was to:

Design & develop a delightful conversational experience that responds to questions in plain text with links and product recommendations.

Architect and build a RAG system that can seamlessly pull from CR’s vast database of product recommendations, reviews and information.

Collaborate as a joint engineering team, which included joint product and project management, regular meetings, peer reviewing code and strategy sessions.

Launch AskCR on web and mobile browsers to CR members in the beta program.

Challenge

A shopper should be able to ask AskCR a question and get a personalized recommendation in the voice of a CR expert.

The biggest challenge for our teams was building a sophisticated RAG system that could understand user’s ambiguous queries and then elegantly find answers and relevant information from CR’s vast database of articles, ratings and reviews.

To do this AskCR has to understand the nuance and intent of a consumer’s question and provide relevant and meaningful information to inform their purchase.

Solution

Data Exploration & Defining Ground Truth

We started by conducting data exploration to understand the structure of the data we were working with. At the same time, NineTwoThree and CR’s product managers worked with CR product experts to establish the ground truth of expected answers so we could evaluate the quality of AskCR’s responses.

Learn more about Human In the Loop methodology

for LLM Accuracy

Custom Agents

The ability to respond with an informed recommendation relies on a custom approach to agentic AI that relies on a set of dynamically configured steps, going from user input to system output. Our teams went through numerous iterations and tested several techniques to balance maintainability, cost, quality and security.

Evaluation, Guardrails & Security

Quality evaluation and security are always important for AI, and even more so for consumer-facing experiences like AskCR.

To ensure the system didn’t get worse at answering both basic questions and illegal or dangerous questions as we continually changed prompts, routers and agents, we implemented guardrails and an evaluation suite. It was critical to have an evaluation suite to know if the system was getting better or worse with time, especially on critical security questions and questions about harmful topics like violence and illegal activity.

Our teams tested extensively, including security testing, “red teaming,” and iterative evaluation of AskCR’s responses to a wide variety of questions.

We were able to improve guardrails performance by over 10X and implemented a product retriever to allow the model to be able to answer more of CR’s products.

Learn more about how our teams used

Open AI’s Moderation Tools

Evaluation, Guardrails & Security

Quality evaluation and security are always important for AI, and even more so for consumer-facing experiences like AskCR.

To ensure the system didn’t get worse at answering both basic questions and illegal or dangerous questions as we continually changed prompts, routers and agents, we implemented guardrails and an evaluation suite.

It was critical to have an evaluation suite to know if the system was getting better or worse with time, especially on critical security questions and questions about harmful topics like violence and illegal activity.

Our teams tested extensively, including security testing, “red teaming,” and iterative evaluation of AskCR’s responses to a wide variety of questions.

We were able to improve guardrails performance by over 10X and implemented a product retriever to allow the model to be able to answer more of CR’s products.

Learn more about how our teams used

Open AI’s Moderation Tools

Show Less

Teaching AI how to understand what the shopper is asking

Next, we had to design a query refinement and routing system that would allow the model to transform the user’s question and then route to the right place in the data to pull in the context that could help answer the question.

The model needed to be able to understand if a shopper is doing general research, asking about a specific product, or looking for help navigating between choices. To do this, we instructed the model to identify intent and topic with a few examples and taught it to look for similar examples.

Teaching AI how to understand what the shopper is asking

Teaching AI how to research like a human

The balance we had to find was between simplicity and maintainability vs. accuracy and quality. The initial approach was easily maintainable, but not as accurate, so we decided to replace our general-purpose agent with a custom built agent made up of multiple dynamically configurable subsystems. These take each query through a refinement, routing, source retrieval, and final synthesis step in order to return an output.

As quality of responses improved, the challenge was making sure that AskCR could still respond in a few seconds rather than minutes. To optimize for speed, we implemented parallelization which allowed multiple systems to run in parallel to summarize the information.

Teaching AI how to research like a human

Show Less

Teaching AI how to understand product features

CR has ratings, articles and information about thousands of products and cars in their databases and all of these data sources have different features.

Our next step was to write detailed instructions for the model on how to interpret CR’s structured ratings and reviews data. To do this at scale and quickly, we built a robust data pipeline capable of handling hundreds of product categories and dynamically generating data schemas for explainability purposes. To do this, we worked with CR’s product experts to define explicitly what features of an air purifier such as water removal and humidistat accuracy mean to a consumer making a purchasing decision so we could instruct the AI on how to interpret the data like a human would.

Teaching AI how to understand product features

CR has ratings, articles and information about thousands of products and cars in their databases and all of these data sources have different features.

Our next step was to write detailed instructions for the model o n how to interpret CR’s structured ratings and reviews data. To do this at scale and quickly, we built a robust data pipeline capable of handling hundreds of product categories and dynamically generating data schemas for explainability purposes.

To do this, we worked with CR’s product experts to define explicitly what features of an air purifier such as water removal and humidistat accuracy mean to a consumer making a purchasing decision so we could instruct the AI on how to interpret the data like a human would.

Show Less

Introducing the Concept of Memory

LLMs by their nature do not remember previous questions you asked in a conversation on their own. To make sure AskCR had awareness of past questions,we had to implement memory. Initially, it was just a buffer of the latest interactions but by the end of the project, we built a hybrid solution that could retain both the latest interactions and long-term memory.

User Testing & Load Testing

Throughout the entire schedule of the project NineTwoThree collaborated with CR’s internal user research team to conduct numerous rounds of external user feedback on the design and concept, as well as internal testing with Consumer Reports product testers and employees who stress tested the actual system to ensure it could handle the concurrent load of beta users.

Production Design

Users can ask questions about product categories or specific models.
Provides plain language answers and links to relevant CR articles. Currently limited to categories and products examined by CR experts, with plans to expand coverage.

Prototype

Click here to experience the design of AskCR by interacting with our clickable prototype that simulates the AI powered shopping & research experience.

Check out the Prototype

Impact

In June 2024, CR announced the successful beta launch of AskCR, available by invitation only. NineTwoThree was excited to bring our experience in AI/ML projects to this cutting-edge use of AI and support CR’s Innovation Lab team in building the next generation of CR tools that help consumers navigate the market.

Learn more about AskCR
and join the Wait List!

If you’re interested in experiencing AskCR for yourself, you can join the waitlist here. The rollout of AskCR will be gradual, ensuring thorough testing and continuous improvement based on user feedback and performance evaluations.

Try AskCR

How We Worked Together

Our teams worked together as one team to deliver this product.

We met several times a week with CR’s Innovation Lab leadership team, Product Manager, Project Manager, and lead engineers to coordinate research, development, testing, delivery, strategize, and review design.

NineTwoThree and Consumer Reports engineers collaborated by conducting regular code reviews and stand-ups to ensure alignment and quality.

We used GitHub for version control, Jira for project management, and Slack for continuous communication.

Conducted design review sessions with CR’s design team to ensure AskCR branding maintained brand consistency with the rest of CR

Collaborated with CR’s user research team on planning internal and external user research rounds and adapting the design and model based on the feedback