1
The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.
A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!
Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of" nested selector system.
New advances in AI allow us to talk to computers in a language we understand. Thanks to Large-language models (LLMs), customers can get answers to specific questions without talking to a human. Marketers can conquer writer’s block with AI-generated drafts and templates based on a single prompt. Developers can debug an issue with an AI coworker who never gets tired.
LLMs like ChatGPT and Gemini are trained on billions of pieces of information. Their design allows them to understand natural language and patterns with remarkable accuracy. They’re incredible at helping marketers brainstorm ideas faster, and engineers diagnose issues easier. They’re also great at customer interaction and support; talking to humans like humans talk. At least, mostly like humans talk.
If you’re a security-focused organization trying to run an LLM at scale, you run into a few problems:
This article explores the best ways to integrate LLMs into your organization while maintaining a high level of security. We’ll detail how privacy-focused systems that use LLMs allow security-conscious organizations the benefits of AI, without the security and compliance concerns of leaking sensitive data outside the company. Organizations that securely improve their applications with LLMs can unlock one of the greatest opportunities in AI.
LLMs allow us to perform a new wave of tasks we couldn’t do before by building on top of a concept called semantic search. Traditional search struggles with surfacing information. Think of storing food data in a database. It’s easy to ask our database for all types of food with “sandwich” in the name. It gets a little harder when you ask your database for “that thing they sell at Subway”. The database isn’t set up in a way to understand what you’re asking for.
Semantic Search is excellent at understanding intent behind a query and communicating the retrieved information. We call this intent-based querying. If we can use semantic search to find the right information, we can use an LLM to take that information and give us an intelligent response. Combining the two is powerful. Semantic Search helps us find the places to look for "What's the best way to overcome writer's block?", LLMs can take the five or six articles we get back, summarize them, and explain them.
Companies like Grammarly and Google improved their products dramatically in the past year by adding LLMs to their core products. You can ask for a draft outline for a presentation, and ChatGPT will instantly create one.
Given the clear advantages of an LLM, it can be tempting to start integrating and replacing. Why do I need customer service when I could have ChatGPT or Google’s Gemini start fielding my customer’s questions?
There are a few key reasons.
While LLMs are powerful, they’re not secure on their own if you want to use customer data. They’re also not reliable. Without access to information, they’ll make things up.
If we could combine the advantages of an LLM (understanding intent, communicating like a human) with a traditional database (allowing us to store and access specific, new information) we’d be able to improve LLM reliability.
In addition to storing information in a structured way, we’ll store it as a vector. A vector is just a large array of numbers. We take a document, like an onboarding guide, and compute the vector representation of the document. This generates an embedding, and we store it in a database, along with a reference to the original document.
The key idea is that documents with similar content will map to similar embeddings. If we generated an embedding on two news articles about the same event, the two embeddings would be closely aligned. We also don’t have to worry about new data being a different structure. As long as the embedding model can generate an embedding, we can compare similar content pieces to each other.
When we search, we can ask in plain English “What’s a great itinerary for a vacation to Alaska?” instead of some SQL query. Then, we can convert that question into an embedding, and find the closest match in our database. We’ll perform vector search to find these similar embeddings, and return their underlying documents. We’ll find documents with great information, like reddit posts on traveling to Alaska, or a blog post from a solo Alaska traveler.
But vector databases are just a store for this data.
How do we give an LLM access to this information?
Enter Retrieval-augmented Generation, or RAG.
This technique leverages both LLMs and vector databases. Instead of an LLM being limited to its training data, it can access information in real-time.
A typical application architecture without RAG might look like this.
RAG adds another step before we send results to the application – first sending the top-K vectors to an LLM. The LLM can draw upon its training data and the presented data, parse through and understand the top results, and craft a better answer than we could.
This architecture allows the LLM to access real-time data from the vector database, even if that data wasn’t in its training set.
Now, you have a smarter layer to locate the right data, which passes through an LLM filter. Instead of parsing the responses yourself, you pass them to an LLM, which is much better at understanding them.
We can do a few new things now. We can ask the LLM to weigh information from the vector database differently than its training data. If a customer asks “What does JIT mean?”, we don’t have to worry about the LLM making a guess based on training data; it will pull from our knowledge base, and weigh that answer higher.
We can also let the LLM continuously analyze our company’s data and learn from it, to understand how our company talks and operates. LLMs excel at this; the more direction we give them, the better they perform. As a company’s knowledge base evolves and changes, the LLM can learn with it.
So, we’ve solved the problem of reliability. Instead of forcing the LLM to rely on the data it was trained on, we can give it access to an external data store.
You’re probably thinking, “How would I guarantee my company’s data is secure?”
There are a few ways to do this.
Option 1: Use existing solutions from large AI solution providers: Companies like OpenAI offer enterprise solutions. You host your application and vector database within your company’s network. You then make encrypted requests to OpenAI’s services for LLMs. This way is far easier, but has a few key pitfalls.
Pros
Cons
Option 2: Build your own privately-hosted LLM solution: If you bring a high-quality, open-source LLM within your firewall, there are advantages to building the entire solution in-house. Meta and Google have robust, powerful open-source models you can use. Instead of an external call to the LLM, you integrate it to your application’s architecture locally. Make no mistake, this is significantly more difficult, and involves significant, nontrivial talent and infrastructure costs. But it does come with some advantages.
Pros
Cons
There are pros and cons to each system. Mature companies with CapEx and expertise can benefit from building their own privately-hosted LLM. Early adopters might appreciate the flexibility of existing AI solutions providers like OpenAI.
Picking the setup isn’t all there is to it, though. Setting the system up takes time and a careful approach.
Here’s what it might look like when creating a production system step-by-step.
GPT, Llama, Claude - all base models trained on large datasets. If you’re picking an enterprise private LLM provider, you’ll connect to that model’s API via your cloud infrastructure. If you’re creating a privately-hosted solution, you’ll download the LLM locally within your network.
Pick a subdomain or subtask (HR case handling, customer support) and gather all your documents. We’ll fine-tune the base model by giving it example documents and questions from your data. This allows the model to key in on the specific customer use case it’s trying to solve.
During this stage, it’s important to keep any sensitive user data secure. Access control and data encryption are critical steps. There are a few techniques for securing user data within the model.
Federated learning: Separate model training between multiple local devices instead of one centralized device. Each device downloads the model, and trains it on their private data, which they don’t share elsewhere.
Differential privacy: Add noise during this training phase, obscuring any specific data from a user.
Continue to improve the performance of your LLM. Use a technique like Reinforcement Learning from Human Feedback (RLHF), where real humans rate and rank the output of the LLM to improve its performance. Establish clear benchmarks for performance and measure it against them.
Take the same documents from your subdomain or subtask, and gather them to be embedded. Pick an embedding model, generate and store the resulting vector embeddings in a vector database. Ensure these embeddings are generated and stored in a secure, compliant manner.
Once we’re production ready, integrate the LLM into your application architecture. When that is functioning, expose the LLM to your vector database. Once the two are connected, the LLM can improve its performance with your data.
As the retrieval-augmented LLM learns more about your organization, you can give it new tasks to handle.
If any of this sounds confusing, fear not: we do this for clients every day. Contact us and we’re happy to walk you through every step of the process.
We’re skipping over a lot of steps, but that’s the basic architecture.
A full production system needs one last orchestrator to ensure the application functions correctly.
It’s one thing to give one LLM access to one vector database for one use case. Often, there’s more nuance. An LLM might have access to all types of company data - spreadsheets in one place, knowledge-base articles in another.
It’s critical to set up an “agent” to manage these decisions. If a customer asks a question, which knowledge base should we pull answers from? Is this a general question we can pull from one data store, or a specific question about a product in a separate database?
We’ll set up a decision-making layer in-between the frontend and the RAG system called an “agent” to make these decisions.
Agents can also escalate when we don’t have the information we need. They can check multiple databases for the answer to the customer’s intent, even if the first (or second, or third) don’t have what we’re looking for. If the answer isn’t there, they can relay that to the customer.
RAG is a great way to retrieve information from a vector database; knowing which database to look through is where an agent excels. The agent knows the internal workings of the application, the context behind the customer, and has specific instructions for what to do.
Ten years ago, every enterprise company hoarded as much data as they could. They didn’t know what to do with it, or how to structure it.
Now, thanks to powerful new large-language models, they do. It’s true that not every industry needs enhanced artificial intelligence like ChatGPT. But so far, it’s tough to find an industry or organization that can’t benefit from LLMs in some way.
Data is the new oil for the AI world. Once you own it and vectorize it, you should be able to monetize it.
Don’t overlook this fact. Your data, industry knowledge, and processes are your competitive edge. LLMs understand all three, and can use them.
Although anyone can access ChatGPT, only authorized users can work with a private LLM. With a private LLM there are no data leaks:everything is hosted within the firewall. There’s also no third-party involvement, period. This makes it much easier to ensure compliance in a regulated industry. An internal LLM is all-knowing about the company, not the internet.
You don’t have to worry about OpenAI killing your feature or your entire product because they know your secrets. You and only you have access to your knowledge base, your data, and your processes.
No clamoring about deprecation windows for public cloud services. Us-east-1 outages no longer affect your AI. Cost is dictated by how you manage your stack, not by how much the private LLM host increases their prices. What’s more, you can access it without an external internet connection, as long as you’re on the corporate network.
Fine-tuning an LLM on your organization’s data will improve its performance. Instead of giving generic answers, your LLM tailors its performance to exactly what your customer is asking for. You can prompt engineer as much as you’d like and provide any context you need. And you’re not making any external API calls, so there’s as close to zero latency as you can ask for.
Increasing your organization’s velocity with generative AI doesn’t have to be a tradeoff of performance and security. There are safe, scalable ways to integrate LLMs into your company’s knowledge base. We’ll help you build them.
Setting up a demo LLM might take you a day.
Creating a production-ready system can take years. And you’ll need experts guiding you who know your blind spots.
You’ll need experts who can improve the current systems with a proven methodology. They’ll also be able to help you set up for observability and regression detection.You’ll need to know how to detect malicious actors and security leaks, and when to add important guardrails.
NineTwoThree helps customers answer all of these questions and more.
If you’re concerned with building the system the right way, we can help.
NineTwoThree is a leading provider of AI application development services, and has been building AI applications since 2016. We have a deep understanding of RAG and generative AI, and we have a proven track record of success in building AI applications.
We have already built 7 applications using retrieval-augmented generation and generative AI, and truly understand the technology and how to use it to solve real-world problems.
Contact us to learn more about our generative AI services today.