How We Built a Lightning-Fast RAG Chatbot with Cloudflare (And Why It Matters)
At Blocksoft, we’re constantly asked how we deliver cutting-edge AI solutions so quickly. Today, we’re lifting the curtain on our latest internal innovation—a fully edge-deployed Retrieval-Augmented Generation (RAG) knowledge chatbot, built entirely using Cloudflare’s ecosystem: Pages, Workers AI, Vectorize, and D1.
Here’s our journey, from ideation and architecture to deployment, complete with lessons learned and pro tips along the way.
🚀 The Challenge
We set out to build an intelligent assistant that could:
- Provide accurate, hallucination-free answers about our services.
- Adhere strictly to compliance regulations (e.g., filtering out phone numbers).
- Achieve ultra-low latency, particularly crucial for our European clients.
- Seamlessly fit into our existing development workflow—commit, push, and deploy.
🔧 Our Tech Stack
Component | Technology | Why We Chose It |
---|---|---|
Vector Store | Cloudflare Vectorize | Ultra-fast embedding retrieval at the edge |
Embeddings | Workers AI (@cf/baai/bge-base-en-v1.5 ) | Fully integrated, zero external API calls |
Structured Data | Cloudflare D1 | Efficiently manages dynamic “lore,” logs, backups |
Edge Runtime | Cloudflare Pages Functions | Immediate response times, no cold-starts |
CI/CD | GitHub Actions → Cloudflare Pages | Automated, reliable, and instantly reversible |
Frontend | Astro + React | Modern, scalable frontend with streaming UI |
All Cloudflare bindings managed via wrangler.toml
—more on this below.
🧠 Training the AI: Our Ingestion Workflow
- Single-Source Content Management Knowledge articles reside as structured JSON (
blocksoft_playbook_v1.json
). Authors update easily without developer intervention. - Smart Chunking Strategy Our custom utility splits articles into ~500-token slices, optimizing metadata size for seamless Vectorize ingestion.
- Efficient Bulk Import Our ingestion script:
const embedding = await env.AI.run("@cf/baai/bge-base-en-v1.5", {text});
await vectorStore.upsert([{ vector: embedding, metadata }]);
Every chunk is simultaneously backed up in Cloudflare D1.
- Continuous Integration Content updates trigger automated GitHub Actions workflows, refreshing embeddings instantly upon each commit.
💬 Natural Conversation: Dynamic Lore & Prompt Engineering
Instead of static prompts, our chatbot dynamically loads conversational “lore” from Cloudflare D1:
CREATE TABLE lore (
id TEXT PRIMARY KEY,
content TEXT NOT NULL,
created_at TEXT,
updated_at TEXT
);
At runtime, the chatbot:
- Retrieves dynamic lore from D1
- Executes semantic search
- Generates context-rich prompts
- Streams responses via the OpenRouter LLM
Adjustments to tone or policy are seamless—updated directly via SQL without redeployments.
📊 Efficient Logging Strategy
We deploy a dual-layer logging system:
- Detailed chat logs: granular tracking for analytics
- Snapshot conversation logs: session summaries capped at 500 characters for immediate monitoring
Archival to Cloudflare R2 ensures compliance and scalability.
🛡️ Security and User Experience
- Cloudflare Turnstile: Invisible CAPTCHA protects APIs effortlessly.
- RegEx-based filtering: Sensitive information, like phone numbers, is always sanitized.
- Instant Response Streaming: Users experience near-instantaneous replies (<200ms latency).
⚙️ Deployment Magic: Our GitHub Actions Workflow
Automated deployments occur seamlessly with every commit:
on:
push:
branches: [main, master, blocksoft]
jobs:
deploy:
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
- run: npm ci
- run: npm run build
- run: npm run setup-rag
- run: npx wrangler d1 migrations apply DB --remote
- uses: cloudflare/pages-action@v1
with: { apiToken: ${{ secrets.CLOUDFLARE_API_TOKEN }} }
Critical credentials never touch our codebase—everything’s securely managed via GitHub secrets.
🗃️ Real-Time Development: Rapid Iteration
Our agile commit history—titles like “rag_11_improved_formatting” and “cf fix 19”—reflects our iterative, feedback-driven approach. Quick deployments meant immediate QA feedback, enabling us to fine-tune our chatbot in real-time.
💡 Lessons Learned
- Chunk Size is Crucial: Correct chunk sizing avoids ingestion issues and enhances accuracy.
- Dynamic Lore is Powerful: Storing prompts in D1 gives product managers flexibility without developer overhead.
- Edge Integration Wins: Having all components (embedding, vectors, SQL, JS) at Cloudflare dramatically reduces latency and operational costs.
- Two-Tier Logs: Ensure comprehensive analytics without performance penalties.
- Automation is Essential: Git-driven deployments prevent human errors and streamline operations.
🚧 What’s Next?
We’re actively expanding our capabilities: