source: eva hesse styled AI image. https://midlibrary.io/styles/eva-hesse

Making LLMs Work with Your Company Data

Stop using LLMs like off-the-shelf software.

George Salapa
8 min readJan 11, 2025

--

There is this class of people across social media (but most prevalent on LinkedIn) who love to post with a sense of personal triumph ChatGPT’s blunders when tested on questions requiring current, factual knowledge (like “How did my country’s income tax law change last year?”) without giving AI the facts. Oh how they marvel and dance upon the LLM's stumbles as it goes on generating plausible-sounding but fabricated answers. “Look!” they say, “how can AI be trusted or ever useful to you when it lies like this?”

Aside from personal distaste for the technological pessimism (scaremongering), their observations fundamentally misunderstand the nature of LLMs by treating them like traditional software. Input -> Output. Always correct. Cannot make mistakes. Deterministic. When in fact, an LLM is nothing like that. It is a weird and wonderful machine, akin to a text calculator. A high-IQ conduit that you negotiate with. Each question is a lottery draw.

It is, quite literally, your intellectual superpower, but you must wield it wisely. On deterministic tasks like coding, frontier LLMs excel out of the box — no additional context needed. But for questions requiring factual, current information — topics of a more open-ended nature — you must provide the necessary source material.

This is simultaneously a great opportunity for companies of all sizes to transform their stale knowledge into intelligence that explains itself to employees. In practical context, the magic of LLMs emerges when we augment each conversation turn with relevant chunks of organizational knowledge. This is otherwise also called Retrieval Augmented Generation (RAG).

Think of the wealth of institutional knowledge buried in your organization: archived emails, regulatory filings, past presentations, legal contracts, technical documentation. This data, while technically accessible, often lies untapped in various silos. By developing your own retrieval architecture that connects these dots intelligently, you’re not just building another corporate tool — you’re creating your organization’s intellectual advantage. Each query to this system becomes a gateway to years of accumulated wisdom, making your entire organizational knowledge base instantly accessible and actionable. In a world where everyone has access to the same frontier LLMs, it is this layer — your unique, proprietary intelligence retrieval system — that becomes your competitive edge.

The How.

When you search for a document on your computer, you will see that it gets you results based on: exact document names, matching keywords in content. But the traditional systems you use (Windows Search, SharePoint, Mac’s Finder) struggle with context of the documents, finding conceptually similar documents or connecting related information across silos. Traditional search -> search “contract termination clause” -> get results with those exact words BUT miss relevant documents using terms like “agreement end terms.”

Say hello to vector stores — a method to find information based on meaning, not just matching words. Think of it like this: each document gets transformed into a mathematical representation (a vector) that captures its semantic essence. When you search, the system finds documents whose mathematical “meaning vectors” are closest to your query’s vector. This is why vector search can find a document about “agreement dissolution” when you search for “contract termination” — it understands they’re conceptually similar.

To understand vector search, we first need to grasp embeddings — numerical representations of text that capture meaning. For our example, we use OpenAI embeddings model, which work very well with frontier LLMs. These embeddings transform each piece of text into a list of numbers (a vector) where similar meanings result in similar vectors. How are embeddings models created? Start with massive text corpus. Train a model on pairs of related text. Model learns to make similar texts have similar vectors.

Now, to efficiently search through these vectors, we use FAISS (Facebook AI Similarity Search) — a library that solves the fundamental challenge of finding similar vectors quickly in a sea of millions. FAISS uses clever techniques like vector quantization (grouping similar vectors together) and inverted indexes (quick lookup structures) to make this search very very fast.

Imagine a massive library: FAISS organizes information in three levels for efficient search. First, it divides books into main sections like Science, History, and Arts (clustering using IVF - Inverted File Index). Then, within each section, it arranges books onto specific shelves like Physics, Chemistry, Biology (quantization). Finally, it creates detailed catalog numbers that describe each book's specific topics - like marking a Physics book as "30% Quantum, 40% Mechanics, 30% Thermodynamics" (product quantization). When searching, FAISS first picks the most relevant sections (e.g., Science for a quantum physics query), then checks the most promising shelves in those sections (e.g., Physics shelf), and finally compares the detailed catalog numbers to find the most similar books. This three-level organization allows FAISS to search millions of books without checking every single one - just like you wouldn't search the Arts section for a quantum physics book. If you are interested in code, I also include a snipper in notes below the line (1).

source: this is how AI imagines FAISS could look like.

While FAISS is powerful, it’s low-level, meaning it does not offer us methods to manage the vector store over its life time, e.g. adding more documents, deleting some documents, updating the database. This is solved by vector store products and libraries that offer these methods. One such library that you can actually run on your own device / in your application is Chroma. If you are interested in code, I also include a snipper in notes below the line (2).

These were local vector stores — meaning you literally install coding library on your machine / inside the app you are building and use it. This is great but not scalable beyond certain point. Enterprise-grade vector stores like Pinecone, Weaviate, and Qdrant provide managed services — you use them as a service.

So What?

It is the combination of LLM + vector store that make the magic happen. When you ask a question about your company’s policies, contracts, or procedures, the vector store instantly finds not just documents containing exact keyword matches, but conceptually relevant information across your entire knowledge base. A query about “employee performance improvement” might surface relevant sections from HR policies, training documents, and past review templates — even if they use different terminology. The LLM then synthesizes these pieces into a coherent, contextual response, effectively turning your static document repository into an intelligent knowledge system that speaks your company’s language.

And this is where traditional enterprise search dies. No more exact word matching, no more sifting through irrelevant results, no more “did you mean…” suggestions that miss the point entirely. Vector search + LLM is your organization’s collective brain. Ask about “employee performance improvement” and watch as it pulls relevant insights from HR policies, training documents, and review templates — even when they use completely different words to describe the same concepts. The LLM doesn’t just find these pieces; it weaves them together into coherent, contextual answers that actually make sense.

Lets Do Better.

Vector stores are incredibly powerful, enabling machines to search & find semantically. But can we do better? What if your company’s data is spread across your CRM showing customer interactions, project management systems tracking deliverables, sales pipeline data, product specifications, and regulatory compliance records? Yes, we can vectorize all this, but we’d miss something crucial: relationships.

This is where we introduce knowledge graphs. Instead of just throwing all data into a vector store, we first create a structured representation of how information interconnects. Imagine a customer node linked to their orders, which connect to product specs, which link to supplier agreements and quality reports. Or a project milestone connected to team members, linking to their expertise records and past successes.

Building such a knowledge graph creates a powerful first layer in our search architecture. When you ask a question in natural language, the system: 1. Identifies relevant starting nodes, 2. Traverses the graph to find related nodes through explicit relationships, 3. Searches the vector store, but only within these identified nodes, 4. Returns not just semantically similar content, but also the explicit relationships that connect them.

Let’s break down how we build this locally. If you are interested in code, I also include a snipper in notes below the line (3). We define nodes (entities) and relationships (connections between entities). As we process documents, we use a small and quick model like gpt-4o-mini to identify these elements and their relationships. We store this structure in a simple JSON file, which we can load into a NetworkX graph for efficient traversal.

NetworkX is crucial here — it’s a Python library that implements graph algorithms. When we search, it helps us find paths (sequences of connected nodes) using algorithms like nx.single_source_shortest_path. This algorithm starts at a node and explores the graph level by level, finding all possible paths up to a specified depth.

Let’s see this in action. Imagine asking: “What projects use technologies similar to Project Alpha?”

1. The graph traversal finds: — project Alpha node, - its specification documents, - related projects through shared components, — team members involved

2. The paths reveal relationships: Project Alpha -> SPEC-123 (tech stack doc) Project Alpha -> Project Beta -> ARCH-789 (shared architecture) Project Alpha -> John Smith -> Project Gamma (team lead overlap)

3. The vector store then searches within these identified documents, finding semantic matches about technologies, frameworks, and architectures.

This combination of semantic similarity and explicit relationships provides the LLM with both the “what” (similar technologies) and the “how” (project connections).

source: gpt-4o with code interpreter in an application that I have developed for client created this.

For small to medium graphs, this local approach works well. However, as your graph grows to millions of nodes and relationships, you’ll want to move to Neo4j. The key difference? Neo4j uses specialized graph algorithms and indexing to find paths without checking every node. Instead of using GPT to identify starting points, Neo4j uses labels, properties, and indexes to quickly find relevant nodes. It can then traverse relationships using optimized graph algorithms that understand the physical storage layout of the nodes.

By building intelligent retrieval systems that understand both semantic similarity and business relationships, you transform out of the box LLM into your company’s intellectual moat.

(1)

In this example, we use FAISS in its simplest form, just searching vectors. As we assume our library is not large enough yet, we can brute force search and just use FAISS’s optimized distance calculations.

(2)

(3)

First we define Graph and its methods. We define simple graph literally stored as JSON on disk.
We use small model to create the nodes and relationships of the Graph from the text we load from docs.
Finally this is how we search. What we end up with is contextually much richer data we can attach to our question before sending to LLM.

--

--

George Salapa
George Salapa

Written by George Salapa

Thoughts on technology, coding, money & culture. Wrote for Forbes and Venturebeat before.

No responses yet