AI Integration in Web Development: Practical Use Cases That Drive Revenue

Stratpace Team10 January 2026 7 min read

One integration, done properly

There's a cottage industry of "ten ways to add AI to your website" posts, and they're almost all useless. Most of them are integrations that solve a problem you don't have, with vendors you've never heard of, costed at "starting from £X / month". So this post does the opposite: one integration, end to end, with real numbers and code you can adapt.

The integration is a knowledge-base assistant. A small box on your site where a visitor can ask a question in plain language, and the answer comes from your own documentation rather than from the model's training data. It's the most common useful AI feature you'll add to a small or mid-size site, and most of the techniques you need for anything else build on this one.

How it works, in one paragraph

You take your knowledge base (help docs, product pages, FAQs, whatever), split it into chunks, generate an embedding for each chunk, and store the chunks alongside their embeddings in a database that supports vector similarity search. When a user asks a question, you embed the question, search for the chunks most similar to it, and pass those chunks to a chat model with a prompt that tells it to answer using only the supplied context. That's it. The pattern is called retrieval-augmented generation, RAG for short, and the rest of this post is the practical version of those two sentences.

Where to keep the vectors

If you're already on Neon Postgres (we are, and it's a sensible default for most small sites), use the pgvector extension. You don't need a separate vector database. One Postgres instance handles your application data and your embeddings in the same query plan, which is simpler to reason about and cheaper to run.

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE kb_chunks (
  id           BIGSERIAL PRIMARY KEY,
  source_url   TEXT NOT NULL,
  content      TEXT NOT NULL,
  embedding    VECTOR(1536) NOT NULL
);

CREATE INDEX ON kb_chunks USING hnsw (embedding vector_cosine_ops);

The 1536 dimensions match OpenAI's text-embedding-3-small, which is what we'll use below. If you swap models, the dimension changes. The HNSW index makes the similarity search fast enough for interactive use without any further tuning at this scale.

Ingesting the knowledge base

Split each document into chunks of around 500 to 1,000 tokens, with a small overlap, then embed and insert. A simple script run on deploy is enough; you don't need a streaming pipeline.

import OpenAI from 'openai'
import { sql } from '@/lib/db'

const openai = new OpenAI()

export async function ingest(docs: { url: string; text: string }[]) {
  for (const doc of docs) {
    const chunks = chunkText(doc.text, { size: 800, overlap: 100 })
    const { data } = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: chunks,
    })
    for (let i = 0; i < chunks.length; i++) {
      await sql`
        INSERT INTO kb_chunks (source_url, content, embedding)
        VALUES (${doc.url}, ${chunks[i]}, ${JSON.stringify(data[i].embedding)})
      `
    }
  }
}

The API route

This is the route the front-end calls. It embeds the user's question, finds the four most relevant chunks, and asks gpt-4o-mini to answer using only those chunks. The system prompt is the most important part; it tells the model to refuse to answer if the context doesn't cover the question, which is what stops it making things up.

import OpenAI from 'openai'
import { sql } from '@/lib/db'

const openai = new OpenAI()

const SYSTEM_PROMPT = `
You are the documentation assistant for Acme Ltd. Answer the user's
question using only the provided context. If the context does not
contain the answer, say so plainly and suggest they contact support.
Do not invent product features, pricing, or policies. Cite the source
URL after each claim, in square brackets.
`.trim()

export async function POST(req: Request) {
  const { question } = await req.json()

  const { data: [{ embedding }] } = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: question,
  })

  const chunks = await sql`
    SELECT content, source_url
    FROM kb_chunks
    ORDER BY embedding <=> ${JSON.stringify(embedding)}
    LIMIT 4
  `

  const context = chunks
    .map((c) => `Source: ${c.source_url}\n${c.content}`)
    .join('\n\n---\n\n')

  const response = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      { role: 'system', content: SYSTEM_PROMPT },
      { role: 'user', content: `Context:\n\n${context}\n\nQuestion: ${question}` },
    ],
  })

  return Response.json({ answer: response.choices[0].message.content })
}

A few details that matter in production. The <=> operator is pgvector's cosine distance; a smaller number is more similar. Four chunks is a reasonable default; on long docs you may need more. Stream the response if it goes into a chat UI; the user-perceived speed is much better.

What this costs in real numbers

OpenAI's pricing as published on platform.openai.com at the time of writing:

›text-embedding-3-small: $0.02 per million tokens. Embedding a 50-page knowledge base costs a few pence one-off, plus the same per question (a question is small, often well under 100 tokens).
›gpt-4o-mini: $0.15 per million input tokens, $0.60 per million output tokens. A typical RAG call sends around 2,000 input tokens (system prompt plus four chunks plus question) and gets back around 200 output tokens. That's roughly $0.0004 per question, give or take.

Round those numbers up and a thousand questions cost about a pound. The Neon Postgres bill for a small business is the same whether you use pgvector or not. The honest answer to "what does this cost" is: less than the email plan you're already paying for.

Why grounding matters more than the model choice

Every team's first instinct is to upgrade the model. It almost never helps. The thing that determines whether the assistant is useful is the quality of the retrieval and the discipline of the prompt. If the retrieved chunks don't contain the answer, no model on earth will answer correctly; if the prompt doesn't forbid invention, even a strong model will quietly make things up. So before you reach for gpt-4o or its successors, look at the chunks the system retrieved for a failed question. Nine times out of ten, the fix is in the data, not the model.

When this is the wrong answer

A few cases where you don't want this. If your knowledge base is small and stable, ten frequently asked questions will outperform a chatbot. If your users mainly want to be put in touch with a human, a clear contact route beats an AI that delays them by a turn. If you can't be confident the assistant won't say something embarrassing, don't ship it; the cost of one bad screenshot on social media is much higher than the cost of not having the feature.

The summary

Vector search plus a chat model, grounded to your own documentation, is the AI integration that earns its keep on most small business sites. Keep the vectors in the Postgres you already have, use cheap models (the embedding model and gpt-4o-mini are fine), discipline the prompt, and watch the retrieval quality before the model quality. The whole thing is around 80 lines of code and costs less than a coffee per thousand questions.

AI IntegrationChatbotsPersonalisationMachine LearningRevenue

Share:𝕏

Scaling Your Web Infrastructure: When and How to Do It Right

Why Third-Party Integrations Make or Break Your Website