What we delivered
AI (LLM), Chatbot, Web App & Product Strategy

Gaurdrailed an LLM to Surface Political Facts

Everytown wanted a way to quickly surface facts from their inventory of articles and facts. NineTwoThree built a vector knowledgebase and semantically searched tables and blogs to help researchers save time finding answers.


Everytown’s staff and research assistants travel the country advocating for gun safety policies. With every city comes a new set of regulations and statistics they can leverage to validate the policies they promote.

While this data makes a powerful case, it requires significant effort to comb through ever-changing records. This means Everytown could invest vast resources in finding insights from this information and still be left quoting figures that are no longer relevant.

NineTwoThree used Large Language Models (LLMs) to create an intelligent assistant that would support Everytown’s staff with up-to-date information about the policies at the local and state level through a chatbot interface.


A topic as sensitive as gun safety cannot afford mistakes often made by LLMs such as knowledge cutoffs, hallucinations, and unpredictable outputs. This presented unfavorable hurdles for this project:

Fact-Checking & Reliability

Although LLMs are advanced you can’t just feed them a spreadsheet and expect a quality result. Data needs to be transformed, cleaned, and formatted in a way that can be understood for the data to be trustworthy.

What if it misinterprets the data on certain areas and presents the wrong insights? This could be bad for a traditional consumer application, but catastrophic when discussing policy around gun safety.

Information Overload

The typical document size for the information we’re looking to ingest with our LLM can be so large that it is too much for the model to process within a single interaction.

This could lead to the model using only part of the information provided. Thus, leading to incomplete responses,  reduced accuracy, and loss of information.

Complex Application Architecture

A chat interface is a reliable starting point; it’s intuitive for the end user and doesn’t need dozens of screens and user intents. Under the hood, though, it’s a different story.

We can’t just rely on one LLM to scale this solution. As referenced above, processing power is finite, and we must give more instructions than one LLM can handle.

Tone & Guard Rails

The end user isn’t likely to abuse this application, but ill intent is a scenario that can’t be overlooked. We must ensure the system matches Everytown’s tone.

Since this system functions as an extension of an employee, it needs to sound and act like one. There shouldn’t be an extra manual step for researchers to change the answer to match the company’s voice.

Grading Criteria

Everytown needed a system that produced the same answers it expected from its team members.

A product like this needs extensive user testing before producing. This chat interface must be digestible and easy for users to give quick feedback.


With the challenges understood NineTwoThree could leverage Everytown’s existing knowledge base along with the following measures to set up the LLM for reaching quality outputs quickly:

Other Screens



The project’s primary objective was to save the researchers time, not create more tasks. Not only that, the responses had to adhere to “Words we use” guidelines, outlining do’s and don'ts of the organization.

It needed to sound like an extension of the team, with high accuracy. The final result did just that. The system performed well across a variety of benchmarks.
The MVP system passed their user standards and is currently in legal review to be deployed to a broader community of Everytown for Gun Safety activists.

This validated their safe, accurate, and frictionless use of AI and continues to save their researchers hours each day.
Passed all 27 validation suite metrics across dozens of expert-written questions.
Performant, reliable system that reduced research time from active hours to seconds.
Comprehensive Q&A user guide for writing prompts.
Passed all customer-defined UAT.
Median full answer response time of under 10 seconds.
85% answer match compared to expert-written responses.

Ready to Launch Your Project?