Lightning-Fast RAG Chatbot

stolberg avatar
stolberg
Cover for Lightning-Fast RAG Chatbot

How We Built a Lightning-Fast RAG Chatbot with Cloudflare (And Why It Matters)

At Blocksoft, we’re constantly asked how we deliver cutting-edge AI solutions so quickly. Today, we’re lifting the curtain on our latest internal innovation—a fully edge-deployed Retrieval-Augmented Generation (RAG) knowledge chatbot, built entirely using Cloudflare’s ecosystem: Pages, Workers AI, Vectorize, and D1.

Here’s our journey, from ideation and architecture to deployment, complete with lessons learned and pro tips along the way.


🚀 The Challenge

We set out to build an intelligent assistant that could:

  • Provide accurate, hallucination-free answers about our services.
  • Adhere strictly to compliance regulations (e.g., filtering out phone numbers).
  • Achieve ultra-low latency, particularly crucial for our European clients.
  • Seamlessly fit into our existing development workflow—commit, push, and deploy.

🔧 Our Tech Stack

ComponentTechnologyWhy We Chose It
Vector StoreCloudflare VectorizeUltra-fast embedding retrieval at the edge
EmbeddingsWorkers AI (@cf/baai/bge-base-en-v1.5)Fully integrated, zero external API calls
Structured DataCloudflare D1Efficiently manages dynamic “lore,” logs, backups
Edge RuntimeCloudflare Pages FunctionsImmediate response times, no cold-starts
CI/CDGitHub Actions → Cloudflare PagesAutomated, reliable, and instantly reversible
FrontendAstro + ReactModern, scalable frontend with streaming UI

All Cloudflare bindings managed via wrangler.toml—more on this below.


🧠 Training the AI: Our Ingestion Workflow

  1. Single-Source Content Management Knowledge articles reside as structured JSON (blocksoft_playbook_v1.json). Authors update easily without developer intervention.
  2. Smart Chunking Strategy Our custom utility splits articles into ~500-token slices, optimizing metadata size for seamless Vectorize ingestion.
  3. Efficient Bulk Import Our ingestion script:
const embedding = await env.AI.run("@cf/baai/bge-base-en-v1.5", {text});
await vectorStore.upsert([{ vector: embedding, metadata }]);

Every chunk is simultaneously backed up in Cloudflare D1.

  1. Continuous Integration Content updates trigger automated GitHub Actions workflows, refreshing embeddings instantly upon each commit.

💬 Natural Conversation: Dynamic Lore & Prompt Engineering

Instead of static prompts, our chatbot dynamically loads conversational “lore” from Cloudflare D1:

CREATE TABLE lore (
  id TEXT PRIMARY KEY,
  content TEXT NOT NULL,
  created_at TEXT,
  updated_at TEXT
);

At runtime, the chatbot:

  • Retrieves dynamic lore from D1
  • Executes semantic search
  • Generates context-rich prompts
  • Streams responses via the OpenRouter LLM

Adjustments to tone or policy are seamless—updated directly via SQL without redeployments.


📊 Efficient Logging Strategy

We deploy a dual-layer logging system:

  • Detailed chat logs: granular tracking for analytics
  • Snapshot conversation logs: session summaries capped at 500 characters for immediate monitoring

Archival to Cloudflare R2 ensures compliance and scalability.


🛡️ Security and User Experience

  • Cloudflare Turnstile: Invisible CAPTCHA protects APIs effortlessly.
  • RegEx-based filtering: Sensitive information, like phone numbers, is always sanitized.
  • Instant Response Streaming: Users experience near-instantaneous replies (<200ms latency).

⚙️ Deployment Magic: Our GitHub Actions Workflow

Automated deployments occur seamlessly with every commit:

on:
  push:
    branches: [main, master, blocksoft]

jobs:
  deploy:
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
      - run: npm ci
      - run: npm run build
      - run: npm run setup-rag
      - run: npx wrangler d1 migrations apply DB --remote
      - uses: cloudflare/pages-action@v1
        with: { apiToken: ${{ secrets.CLOUDFLARE_API_TOKEN }} }

Critical credentials never touch our codebase—everything’s securely managed via GitHub secrets.


🗃️ Real-Time Development: Rapid Iteration

Our agile commit history—titles like “rag_11_improved_formatting” and “cf fix 19”—reflects our iterative, feedback-driven approach. Quick deployments meant immediate QA feedback, enabling us to fine-tune our chatbot in real-time.


💡 Lessons Learned

  • Chunk Size is Crucial: Correct chunk sizing avoids ingestion issues and enhances accuracy.
  • Dynamic Lore is Powerful: Storing prompts in D1 gives product managers flexibility without developer overhead.
  • Edge Integration Wins: Having all components (embedding, vectors, SQL, JS) at Cloudflare dramatically reduces latency and operational costs.
  • Two-Tier Logs: Ensure comprehensive analytics without performance penalties.
  • Automation is Essential: Git-driven deployments prevent human errors and streamline operations.

🚧 What’s Next?

We’re actively expanding our capabilities: