Project
AskCR
Industry
Consumer Product
What we delivered
Product Strategy, UI/UX Design, AI/ML Engineering

Next Generation Conversational AI

NineTwoThree was selected by the CR Innovation Lab to help build an experimental chatbot that combines the power of AI with CR's expertise to answer your questions and offer product recommendations. NineTwoThree helped design and implement the system alongside CR’s engineering and product team.
consumer-reports-screens-mob
consumer-reports-screens-mob
consumer-reports-screens-mob
cr-concept

Concept

Imagine shopping for a mattress and being able to give an AI chatbot your preferences such as: “I’m a tall person who sleeps on my back, what is a good mattress for me?” and getting a personalized recommendation.

For over 80 years, CR has earned the reputation of delivering unbiased and well researched product information to consumers. CR is well-positioned to deliver a powerful AI chatbot that can help consumers get their product questions answered with  reliable, unbiased information.

CR’s innovation team was quick to move towards building an AI-powered tool that can help people doing product research get to the right answers faster. They hired NineTwoThree as their development partner because of our engineers’ proven AI/ML expertise launching similar systems to hundreds of thousands of users in production, as well as our experience working with innovation labs from both public and non-profit companies.
cr-concept

The goal of this 7 month project was to:

prototype-icon
Design & develop a delightful conversational experience that responds to questions in plain text with links and product recommendations.
rag-system-icon
Architect and build a RAG system that can seamlessly pull from CR’s vast database of product recommendations, reviews and information.
organizations-icon
Collaborate as a joint engineering team, which included joint product and project management, regular meetings, peer reviewing code and strategy sessions.
browsers-icon
Launch AskCR on web and mobile browsers to CR members in the beta program.
cr-challenge

Challenge

A shopper should be able to ask AskCR a question and get a personalized recommendation in the voice of a CR expert.
The biggest challenge for our teams was building a sophisticated RAG system that could understand user’s ambiguous queries and then elegantly find answers and relevant information from CR’s vast database of articles, ratings and reviews.

To do this AskCR has to understand the nuance and intent of a consumer’s question and provide relevant and meaningful information to inform their purchase.

Solution

cr-solutioncr-solution-mob

Data Exploration & Defining Ground Truth

We started by conducting data exploration to understand the structure of the data we were working with. At the same time, NineTwoThree and CR’s product managers worked with CR product experts to establish the ground truth of expected answers so we could evaluate the quality of AskCR’s responses.
cr-data-exploration

Custom Agents

The ability to respond with an informed recommendation relies on a custom approach to agentic AI that relies on a set of dynamically configured steps, going from user input to system output. Our teams went through numerous iterations and tested several techniques to balance maintainability, cost, quality and security.

Evaluation, Guardrails & Security

Quality evaluation and security are always important for AI, and even more so for consumer-facing experiences like AskCR.

To ensure the system didn’t get worse at answering both basic questions and illegal or dangerous questions as we continually changed prompts, routers and agents, we implemented guardrails and an evaluation suite. It was critical to have an evaluation suite to know if the system was getting better or worse with time, especially on critical security questions and questions about harmful topics like violence and illegal activity.

Our teams tested extensively, including security testing, “red teaming,” and iterative evaluation of AskCR’s responses to a wide variety of questions.

We were able to improve guardrails performance by over 10X and implemented a product retriever to allow the model to be able to answer more of CR’s products.
1

Evaluation, Guardrails & Security

Quality evaluation and security are always important for AI, and even more so for consumer-facing experiences like AskCR.

To ensure the system didn’t get worse at answering both basic questions and illegal or dangerous questions as we continually changed prompts, routers and agents, we implemented guardrails and an evaluation suite.

It was critical to have an evaluation suite to know if the system was getting better or worse with time, especially on critical security questions and questions about harmful topics like violence and illegal activity.

Our teams tested extensively, including security testing, “red teaming,” and iterative evaluation of AskCR’s responses to a wide variety of questions.

We were able to improve guardrails performance by over 10X and implemented a product retriever to allow the model to be able to answer more of CR’s products.
cr-step-1

Teaching AI how to understand what the shopper is asking

Next, we had to design a query refinement and routing system that would allow the model to transform the user’s question and then route to the right place in the data to pull in the context that could help answer the question.

The model needed to be able to understand if a shopper is doing general research, asking about a specific product, or looking for help navigating between choices. To do this, we instructed the model to identify intent and topic with a few examples and taught it to look for similar examples.
cr-dotted-arrow

Teaching AI how to understand what the shopper is asking

Next, we had to design a query refinement and routing system that would allow the model to transform the user’s question and then route to the right place in the data to pull in the context that could help answer the question.

The model needed to be able to understand if a shopper is doing general research, asking about a specific product, or looking for help navigating between choices. To do this, we instructed the model to identify intent and topic with a few examples and taught it to look for similar examples.
2
cr-step-2

Teaching AI how to research like a human

The balance we had to find was between simplicity and maintainability vs. accuracy and quality. The initial approach was easily maintainable, but not as accurate, so we decided to replace our general-purpose agent with a custom built agent made up of multiple dynamically configurable subsystems. These take each query through a refinement, routing, source retrieval, and final synthesis step in order to return an output.

As quality of responses improved, the challenge was making sure that AskCR could still respond in a few seconds rather than minutes. To optimize for speed, we implemented parallelization which allowed multiple systems to run in parallel to summarize the information.
cr-step-3
3

Teaching AI how to research like a human

The balance we had to find was between simplicity and maintainability vs. accuracy and quality. The initial approach was easily maintainable, but not as accurate, so we decided to replace our general-purpose agent with a custom built agent made up of multiple dynamically configurable subsystems. These take each query through a refinement, routing, source retrieval, and final synthesis step in order to return an output.

As quality of responses improved, the challenge was making sure that AskCR could still respond in a few seconds rather than minutes. To optimize for speed, we implemented parallelization which allowed multiple systems to run in parallel to summarize the information.
cr-dotted-arrow

Teaching AI how to understand product features

CR has ratings, articles and information about thousands of products and cars in their databases and all of these data sources have different features.

Our next step was to write detailed instructions for the model on how to interpret CR’s structured ratings and reviews data. To do this at scale and quickly, we built a robust data pipeline capable of handling hundreds of product categories and dynamically generating data schemas for explainability purposes. To do this, we worked with CR’s product experts to define explicitly what features of an air purifier such as water removal and humidistat accuracy mean to a consumer making a purchasing decision so we could instruct the AI on how to interpret the data like a human would.
cr-step-4
4

Teaching AI how to understand product features

CR has ratings, articles and information about thousands of products and cars in their databases and all of these data sources have different features.

Our next step was to write detailed instructions for the model o n how to interpret CR’s structured ratings and reviews data. To do this at scale and quickly, we built a robust data pipeline capable of handling hundreds of product categories and dynamically generating data schemas for explainability purposes.

To do this, we worked with CR’s product experts to define explicitly what features of an air purifier such as water removal and humidistat accuracy mean to a consumer making a purchasing decision so we could instruct the AI on how to interpret the data like a human would.
cr-dotted-arrow

Introducing the Concept of Memory

LLMs by their nature do not remember previous questions you asked in a conversation on their own. To make sure AskCR had awareness of past questions,we had to implement memory. Initially, it was just a buffer of the latest interactions but by the end of the project, we built a hybrid solution that could retain both the latest interactions and long-term memory.
Introducing the Concept of Memory

User Testing & Load Testing

Throughout the entire schedule of the project NineTwoThree collaborated with CR’s internal user research team to conduct numerous rounds of external user feedback on the design and concept, as well as internal testing with Consumer Reports product testers and employees who stress tested the actual system to ensure it could handle the concurrent load of beta users.
user-testing

Production Design

Users can ask questions about product categories or specific models.
Provides plain language answers and links to relevant CR articles. Currently limited to categories and products examined by CR experts, with plans to expand coverage.
production-designcr-product-design-mob

Prototype

Click here to experience the design of AskCR by interacting with our clickable prototype that simulates the AI powered shopping & research experience.

Impact

In June 2024, CR announced the successful beta launch of AskCR, available by invitation only. NineTwoThree was excited to bring our experience in AI/ML projects to this cutting-edge use of AI and support CR’s Innovation Lab team in building the next generation of CR tools that help consumers navigate the market.
cr-impactcr-impact-mob

Learn more about AskCR
and join the Wait List!

If you’re interested in experiencing AskCR for yourself, you can join the waitlist here.  The rollout of AskCR will be gradual, ensuring thorough testing and continuous improvement based on user feedback and performance evaluations.

How We Worked Together

Our teams worked together as one team to deliver this product.
We met several times a week with CR’s Innovation Lab leadership team, Product Manager, Project Manager, and lead engineers to coordinate research, development, testing, delivery, strategize, and review design.
NineTwoThree and Consumer Reports engineers collaborated by conducting regular code reviews and stand-ups to ensure alignment and quality.
NineTwoThree and Consumer Reports engineers collaborated by conducting regular code reviews and stand-ups to ensure alignment and quality.
We used GitHub for version control, Jira for project management, and Slack for continuous communication.
Conducted design review sessions with CR’s design team to ensure AskCR branding maintained brand consistency with the rest of CR
Collaborated with CR’s user research team on planning internal and external user research rounds and adapting the design and model based on the feedback

Other Screeens

cr-other-screenscr-more-screens-mob
923-footer-icon

Thanks for reading!

NineTwoThree. 2024