Building an AI chatbot for small business that actually knows your products

The RAG stack we ship for small business chatbots: ingestion, embeddings, retrieval, guardrails, and the boring parts that decide whether it answers correctly or hallucinates a refund policy.

Building an AI chatbot for small business that actually knows your products
KEY TAKEAWAYS
  • Most small business chatbots fail at retrieval, not generation — fix ingestion before you touch prompts.
  • Chunk by semantic boundary (product, FAQ, policy), not by token count, or you'll split a price from its product name.
  • Hybrid search (BM25 plus embeddings) beats pure vector search for product catalogs with SKUs and exact names.
  • Log every unanswered question. That log is your roadmap, not your analytics dashboard.
  • Guardrails belong in the system prompt and in code. Trust neither alone.

A client sent me a screenshot last month. Their chatbot, built by someone else, confidently told a customer their store offered free shipping to Australia. They do not ship to Australia. They never have.

This is the default state of an AI chatbot for small business right now. It sounds great in the demo, then it invents a return policy on a Tuesday and you find out on Friday from an angry email. The model is not the problem. The retrieval is.

We’ve built enough of these at EdsDev to have opinions about what actually works. Here’s the RAG stack we use, why each piece is there, and the parts most tutorials skip.

What a customer service AI chatbot actually needs to know

Before any code, sit down and list every question a real customer has asked in the last 90 days. Pull from email, Instagram DMs, the contact form, whatever you have. You’ll find the answers cluster into four buckets:

A good ai bot for small business handles the first two cleanly, escalates the third to an order lookup tool, and is honest about the fourth. Most bots try to fake their way through all four. That’s where the Australia shipping incident comes from.

The stack

Here is what we usually ship, give or take the client’s existing tools:

No Pinecone. No LangChain. No vector database SaaS with a $70/month floor. For a shop with 200 products and 40 FAQ entries, pgvector on a Hetzner VPS will serve a chatbot until you have real scale problems, and by then you can afford to migrate.

Ingestion is 70% of the work

This is the part nobody writes blog posts about because it’s not fun. But if your data going in is garbage, no amount of prompt engineering saves you.

For a Shopify client we pull products like this:

const products = await shopify.product.list({ limit: 250 });

for (const p of products) {
  const doc = {
    id: `product:${p.id}`,
    title: p.title,
    sku: p.variants[0]?.sku,
    price: p.variants[0]?.price,
    in_stock: p.variants.some(v => v.inventory_quantity > 0),
    body: stripHtml(p.body_html),
    tags: p.tags.split(', '),
    url: `https://${shop}/products/${p.handle}`,
    updated_at: p.updated_at,
  };
  await upsert(doc);
}

Notice what we keep separate: sku, price, in_stock. These don’t go into the embedding. They go into structured columns so the LLM can be told “the SKU is XYZ-42, the price is $89, currently in stock” as plain facts. Embeddings are great for fuzzy matching. They are bad at remembering that the medium is $5 more than the small.

For the policy doc, we don’t chunk by 500-token windows. We split on H2 headings. One chunk per policy section. Returns policy is one chunk. Shipping policy is one chunk. If a policy is 2,000 tokens, fine, it’s one chunk of 2,000 tokens. Splitting it in the middle is how you get the bot confidently quoting half a sentence.

Hybrid search, because product names are not vibes

Pure vector search will fail you the first time someone types a SKU. Embeddings smooth out meaning, which is the opposite of what you want when the user types “AC-220-BLK” and means exactly that string.

We run both searches in parallel and fuse the rankings:

async function retrieve(query: string, k = 8) {
  const [vectorHits, keywordHits] = await Promise.all([
    vectorSearch(query, k * 2),
    keywordSearch(query, k * 2),
  ]);
  return reciprocalRankFusion([vectorHits, keywordHits], k);
}

Reciprocal rank fusion is three lines of code. You don’t need a reranker model for a 200-product catalog. You might want one (Cohere Rerank, or bge-reranker) when you cross a few thousand documents. Until then it’s complexity you don’t need.

The system prompt is shorter than you think

A mistake I see often: 3,000-token system prompts trying to enumerate every edge case. The model stops reading them. We keep ours around 400 tokens. The shape:

  1. Who you are and who you work for
  2. What you can and cannot do (escalate to human for X, Y)
  3. How to use the retrieved context (quote it, don’t paraphrase prices)
  4. What to do when you don’t know (say so, offer to email the team)

That last one is the difference between a chatbot that ships and one that gets pulled in week two. The prompt explicitly says: if the answer is not in the provided context, say “I’m not sure, want me to pass this to the team?” and call the escalate tool. We then check in code that if the model didn’t call escalate and didn’t cite a retrieved chunk, we flag the response for review.

Guardrails in the prompt, guardrails in code. Neither alone.

Order status is a tool, not a document

Don’t try to embed order data. It changes by the minute and you’ll be chasing stale answers forever. Give the model a tool:

{
  name: 'lookup_order',
  description: 'Look up order status by order number and email',
  input_schema: {
    type: 'object',
    properties: {
      order_number: { type: 'string' },
      email: { type: 'string' },
    },
    required: ['order_number', 'email'],
  },
}

The model asks for the order number and email, calls the tool, gets back JSON, reports the status. No RAG involved. This pattern (RAG for static knowledge, tools for live data) is what separates the best AI chatbot for small business setups from the ones that lie about tracking numbers.

The logging table that pays for itself

Every query, retrieval IDs and scores, final answer, latency, whether escalation fired. One row per turn. Once a week I open that table sorted by “no good retrieval hit” and read the questions. That list is your content roadmap. Half the time the answer exists somewhere on the site and just isn’t in the ingested data. The other half, the business genuinely doesn’t have an answer yet, and now they know.

This is also how you defend the bot when someone says “it gave a wrong answer.” You can pull the exact retrieved context and see whether the model hallucinated or the source document is wrong. Usually it’s the source document.

What this costs

For a typical client doing 1,000 conversations a month, averaging 4 turns each:

Under $60 a month to run, plus the build. That’s the honest number. Anyone quoting you $500/month in infrastructure for a small business chatbot is selling you a vector DB you don’t need.

If you want help putting one of these together for your shop, we do this. Tell us what you’re stuck on.

FREQUENTLY ASKED

Common questions

How long does AI chatbot setup take for a small business?

For a typical Shopify or service business with a few hundred products and a handful of policy pages, we ship a working bot in 1-2 weeks. Most of that time is ingestion and writing the policy docs the business never had written down. The code is the fast part. The slow part is getting your team to agree on the actual return policy so the bot can quote it accurately.

Do I need a vector database like Pinecone?

No, not until you have tens of thousands of documents. Postgres with the pgvector extension handles small business catalogs comfortably on a $7/month VPS. Pinecone, Weaviate, and similar services start around $70/month and add operational complexity. Start with pgvector or Supabase. Migrate later if you actually need to. You probably won't.

What's the best AI chatbot for small business if I don't want to build one?

Intercom Fin, Tidio, and Chatbase are reasonable off-the-shelf options. They handle the common case well. The tradeoff is you get less control over retrieval quality, your data lives in their system, and per-conversation pricing adds up fast above a few hundred chats a month. If your knowledge base is small and stable, off the shelf is fine. If you have specific products, custom workflows, or want to own the stack, build.

How do I stop the chatbot from making things up?

Three layers. First, only let the model answer using retrieved context, and require it to cite which chunk it used. Second, in code, check that the response actually corresponds to retrieved content for any claim about price, policy, or stock. Third, give it a clean escape: an escalate tool that emails a human. Models hallucinate when they feel pressure to answer. Remove the pressure.

Can the chatbot handle order status and refunds?

Order status, yes, via a tool call to your commerce platform's API. The model asks for an order number and email, looks it up live, and reports back. Refunds are different. We never let the bot actually issue refunds. It can collect the request, verify eligibility against the policy, and create a ticket for a human to approve. The financial decision stays with a person.

What does it cost to run monthly?

For around 1,000 conversations a month with Claude Sonnet 4, expect $30-50 in API costs, plus a $7 VPS, plus free-tier usage of Cloudflare and an email provider like Resend. Total under $60/month for the infrastructure. If you swap to gpt-4o-mini you can cut the model cost roughly in half with a quality tradeoff that's noticeable but acceptable for many small businesses.

SOURCES
  1. [1]
  2. [2]
  3. [3]
  4. [4]
    OpenAI embeddings guideplatform.openai.com

Related posts