mirror of
https://github.com/Manoj-HV30/clawrity.git
synced 2026-05-16 19:35:21 +00:00
9.7 KiB
9.7 KiB
Clawrity
Multi-channel AI business intelligence agent. Ask questions in natural language via Slack or REST API and get data-grounded answers with specific numbers, daily digests, budget recommendations, ROI forecasts, and competitor intelligence.
Architecture
User (Slack/API) → ProtocolAdapter → Orchestrator → NL-to-SQL → PostgreSQL
↓
Gen Agent (LLM) → QA Agent → Response
↑
RAG Retriever (pgvector)
↑
Scout Agent (web search)
- Orchestrator — coordinates the full pipeline with retry logic
- Gen Agent — generates data-grounded responses with specific figures
- QA Agent — validates responses for hallucinations (branch names, numbers)
- Scout Agent — fetches competitor/sector news via Tavily
- RAG Retriever — semantic search over historical business data (pgvector)
- SOUL.md — per-client personality and rules
- HEARTBEAT.md — autonomous daily digest scheduling
Tech Stack
| Component | Tool |
|---|---|
| Language | Python 3.11 |
| API Framework | FastAPI + uvicorn |
| LLM | Groq (llama-3.3-70b-versatile) or NVIDIA NIM |
| Embeddings | sentence-transformers all-MiniLM-L6-v2 (384d) |
| Database | PostgreSQL + pgvector |
| Channel | Slack Bolt SDK (Socket Mode) |
| Scheduler | APScheduler |
| Web Search | Tavily API + DuckDuckGo fallback |
| Forecasting | Prophet |
Quick Start (From Scratch)
Prerequisites
- Python 3.11+
- Docker & Docker Compose
- Groq API key (free)
- Tavily API key (free)
1. Clone & Setup
git clone <your-repo-url>
cd clawrity
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # Linux/Mac
# venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
2. Configure Environment
cp .env.example .env
Edit .env and fill in your keys:
GROQ_API_KEY=gsk_... # from console.groq.com
DATABASE_URL=postgresql://user:pass@localhost:5432/clawrity
TAVILY_API_KEY=tvly-... # from app.tavily.com
# Slack (optional — for Slack integration)
SLACK_BOT_TOKEN=xoxb-...
SLACK_APP_TOKEN=xapp-...
SLACK_SIGNING_SECRET=...
# Digest webhook (optional)
ACME_SLACK_WEBHOOK=https://hooks.slack.com/services/...
3. Start PostgreSQL + pgvector
docker compose up -d postgres
Wait ~10 seconds for PostgreSQL to initialize, then verify:
docker compose ps
# postgres should show "healthy"
4. Download Datasets
Download these two Kaggle datasets and place the files in data/raw/:
- Global Superstore: https://kaggle.com/datasets/apoorvaappz/global-super-store-dataset
- Marketing Campaign Performance: https://kaggle.com/datasets/manishabhatt22/marketing-campaign-performance-dataset
mkdir -p data/raw data/processed
# Place Global_Superstore2.csv and marketing_campaign_dataset.csv in data/raw/
5. Seed Demo Data
python scripts/seed_demo_data.py --client_id acme_corp \
--superstore data/raw/Global_Superstore2.csv \
--marketing data/raw/marketing_campaign_dataset.csv
6. Run RAG Pipeline
python scripts/run_rag_pipeline.py --client_id acme_corp
7. Start the Server
uvicorn main:app --reload --port 8000
Server runs at http://localhost:8000. Health check: http://localhost:8000/health
Test the API
# Simple question
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"client_id": "acme_corp", "message": "What is the total revenue for the Seattle branch?"}'
# Recommendation question
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"client_id": "acme_corp", "message": "How can we improve revenue for the Seattle branch?"}'
# Trigger digest
curl -X POST http://localhost:8000/digest \
-H "Content-Type: application/json" \
-d '{"client_id": "acme_corp"}'
Slack Bot Setup (Socket Mode)
1. Create Slack App
- Go to https://api.slack.com/apps
- Click Create New App → From scratch
- Name it
Clawrityand select your workspace
2. Enable Socket Mode
- Left sidebar → Socket Mode → Toggle ON
- Generate Token → name it
clawrity-socket - Copy the
xapp-...token → paste into.envasSLACK_APP_TOKEN
3. Configure Bot Permissions
- OAuth & Permissions → Bot Token Scopes, add:
app_mentions:readchat:writechannels:historychannels:readim:historyim:readim:write
- Click Install to Workspace
- Copy the
xoxb-...token → paste into.envasSLACK_BOT_TOKEN
4. Enable Events
- Event Subscriptions → Toggle ON
- Under Subscribe to bot events, add:
app_mentionmessage.channelsmessage.im
- Click Save Changes
5. Get Signing Secret
- Basic Information → App Credentials
- Copy Signing Secret → paste into
.envasSLACK_SIGNING_SECRET
6. Invite Bot to Channel
/invite @Clawrity
API Endpoints
| Method | Path | Description |
|---|---|---|
POST |
/chat |
Send message → get AI response |
POST |
/compare |
Side-by-side RAG vs no-RAG comparison |
POST |
/scout |
Targeted competitor/market intelligence search |
POST |
/scout/digest |
Full scout agent digest for a client |
POST |
/digest |
Manually trigger daily digest pipeline |
GET |
/admin/stats/{client_id} |
RAG monitoring stats |
POST |
/forecast/run/{client_id} |
Trigger Prophet forecasting |
GET |
/forecast/{client_id}/{branch} |
Get cached forecast |
GET |
/health |
System health check |
Example Questions to Ask
| Category | Question |
|---|---|
| Simple data | "What is the total revenue for the Seattle branch?" |
| Channel analysis | "Show me revenue by channel for Seattle" |
| Rankings | "What are the top 5 branches by revenue?" |
| ROI | "What is the ROI for New York City?" |
| Country drill-down | "Show me total revenue by country for Australia" |
| Recommendations | "How can we improve revenue for the Seattle branch?" |
| Strategy | "What strategy would you recommend for the London branch?" |
| Trends | "What is the revenue trend from 2011 to 2014?" |
| Channel comparison | "Which channel has the highest ROI overall?" |
| Bottom performers | "What are the bottom 10 performing branches?" |
Adding a New Client
- Create
config/clients/client_<name>.yaml(copy fromclient_acme.yaml) - Create
soul/<name>_soul.mdwith personality/rules - Create
heartbeat/<name>_heartbeat.mdwith schedule - Place data in
data/raw/and run seed + RAG scripts - Restart — zero code changes required
Project Structure
clawrity/
├── main.py # FastAPI application + lifespan
├── agents/
│ ├── orchestrator.py # Pipeline coordinator (retry loop)
│ ├── gen_agent.py # LLM response generation
│ ├── qa_agent.py # Hallucination checker
│ └── scout_agent.py # Competitor intelligence
├── config/
│ ├── settings.py # pydantic-settings from .env
│ ├── llm_client.py # LLM factory (Groq/NVIDIA) with retry
│ ├── client_loader.py # YAML client config loader
│ └── clients/client_acme.yaml
├── channels/
│ ├── protocol_adapter.py # Message normalisation
│ ├── slack_handler.py # Slack Socket Mode
│ └── teams_handler.py # Teams stub
├── skills/
│ ├── nl_to_sql.py # Natural language → SQL
│ ├── postgres_connector.py # PostgreSQL + pgvector
│ └── web_search.py # Tavily + DuckDuckGo
├── rag/
│ ├── preprocessor.py # Data cleaning
│ ├── chunker.py # Semantic chunking
│ ├── vector_store.py # Embed + pgvector store
│ ├── retriever.py # Intent-based retrieval
│ ├── evaluator.py # RAG quality metrics
│ └── monitoring.py # JSONL interaction logging
├── soul/
│ ├── soul_loader.py
│ └── acme_soul.md
├── heartbeat/
│ ├── heartbeat_loader.py
│ ├── scheduler.py # APScheduler digest jobs
│ └── acme_heartbeat.md
├── forecasting/
│ └── prophet_engine.py # Prophet time series
├── connectors/
│ ├── base_connector.py
│ └── csv_connector.py
├── etl/
│ └── normaliser.py
├── scripts/
│ ├── seed_demo_data.py # Seed PostgreSQL from CSV
│ └── run_rag_pipeline.py # Preprocess → chunk → embed
├── docker-compose.yml
├── Dockerfile
└── requirements.txt
Troubleshooting
| Issue | Fix |
|---|---|
Connection refused on /chat |
PostgreSQL not running — docker compose up -d postgres |
Rate limited (429) |
LLM API throttling — system auto-retries with backoff |
No module named 'X' |
Activate venv: source venv/bin/activate |
| Slack bot not responding | Check SLACK_BOT_TOKEN and SLACK_APP_TOKEN in .env |
Clawrity digest unavailable |
Set valid ACME_SLACK_WEBHOOK in .env |
| Embeddings slow on first run | MiniLM downloads ~80MB on first use — subsequent runs are cached |
License
Private — internal use only.