Lightning-Fast RAG Chatbot

stolberg

•May 13, 2025

Updated: May 13, 2025

How We Built a Lightning-Fast RAG Chatbot with Cloudflare (And Why It Matters)

At Blocksoft, we’re constantly asked how we deliver cutting-edge AI solutions so quickly. Today, we’re lifting the curtain on our latest internal innovation—a fully edge-deployed Retrieval-Augmented Generation (RAG) knowledge chatbot, built entirely using Cloudflare’s ecosystem: Pages, Workers AI, Vectorize, and D1.

Here’s our journey, from ideation and architecture to deployment, complete with lessons learned and pro tips along the way.

🚀 The Challenge

We set out to build an intelligent assistant that could:

Provide accurate, hallucination-free answers about our services.
Adhere strictly to compliance regulations (e.g., filtering out phone numbers).
Achieve ultra-low latency, particularly crucial for our European clients.
Seamlessly fit into our existing development workflow—commit, push, and deploy.

🔧 Our Tech Stack

Component	Technology	Why We Chose It
Vector Store	Cloudflare Vectorize	Ultra-fast embedding retrieval at the edge
Embeddings	Workers AI (`@cf/baai/bge-base-en-v1.5`)	Fully integrated, zero external API calls
Structured Data	Cloudflare D1	Efficiently manages dynamic “lore,” logs, backups
Edge Runtime	Cloudflare Pages Functions	Immediate response times, no cold-starts
CI/CD	GitHub Actions → Cloudflare Pages	Automated, reliable, and instantly reversible
Frontend	Astro + React	Modern, scalable frontend with streaming UI

All Cloudflare bindings managed via wrangler.toml—more on this below.

🧠 Training the AI: Our Ingestion Workflow

Single-Source Content Management Knowledge articles reside as structured JSON (blocksoft_playbook_v1.json). Authors update easily without developer intervention.
Smart Chunking Strategy Our custom utility splits articles into ~500-token slices, optimizing metadata size for seamless Vectorize ingestion.
Efficient Bulk Import Our ingestion script:

const embedding = await env.AI.run("@cf/baai/bge-base-en-v1.5", {text});
await vectorStore.upsert([{ vector: embedding, metadata }]);

Every chunk is simultaneously backed up in Cloudflare D1.

Continuous Integration Content updates trigger automated GitHub Actions workflows, refreshing embeddings instantly upon each commit.

💬 Natural Conversation: Dynamic Lore & Prompt Engineering

Instead of static prompts, our chatbot dynamically loads conversational “lore” from Cloudflare D1:

CREATE TABLE lore (
  id TEXT PRIMARY KEY,
  content TEXT NOT NULL,
  created_at TEXT,
  updated_at TEXT
);

At runtime, the chatbot:

Retrieves dynamic lore from D1
Executes semantic search
Generates context-rich prompts
Streams responses via the OpenRouter LLM

Adjustments to tone or policy are seamless—updated directly via SQL without redeployments.

📊 Efficient Logging Strategy

We deploy a dual-layer logging system:

Detailed chat logs: granular tracking for analytics
Snapshot conversation logs: session summaries capped at 500 characters for immediate monitoring

Archival to Cloudflare R2 ensures compliance and scalability.

🛡️ Security and User Experience

Cloudflare Turnstile: Invisible CAPTCHA protects APIs effortlessly.
RegEx-based filtering: Sensitive information, like phone numbers, is always sanitized.
Instant Response Streaming: Users experience near-instantaneous replies (<200ms latency).

⚙️ Deployment Magic: Our GitHub Actions Workflow

Automated deployments occur seamlessly with every commit:

on:
  push:
    branches: [main, master, blocksoft]

jobs:
  deploy:
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
      - run: npm ci
      - run: npm run build
      - run: npm run setup-rag
      - run: npx wrangler d1 migrations apply DB --remote
      - uses: cloudflare/pages-action@v1
        with: { apiToken: ${{ secrets.CLOUDFLARE_API_TOKEN }} }

Critical credentials never touch our codebase—everything’s securely managed via GitHub secrets.

🗃️ Real-Time Development: Rapid Iteration

Our agile commit history—titles like “rag_11_improved_formatting” and “cf fix 19”—reflects our iterative, feedback-driven approach. Quick deployments meant immediate QA feedback, enabling us to fine-tune our chatbot in real-time.

💡 Lessons Learned

Chunk Size is Crucial: Correct chunk sizing avoids ingestion issues and enhances accuracy.
Dynamic Lore is Powerful: Storing prompts in D1 gives product managers flexibility without developer overhead.
Edge Integration Wins: Having all components (embedding, vectors, SQL, JS) at Cloudflare dramatically reduces latency and operational costs.
Two-Tier Logs: Ensure comprehensive analytics without performance penalties.
Automation is Essential: Git-driven deployments prevent human errors and streamline operations.

🚧 What’s Next?

We’re actively expanding our capabilities:

Back to all posts