response redundancy fixed and proper backend communication

2026-05-16 19:35:21 +00:00 · 2026-05-05 17:58:58 +05:30
parent 711d691870
commit ba61963d6f
12 changed files with 880 additions and 287 deletions
@@ -1,17 +1,30 @@
 # Clawrity

-**Multi-channel AI business intelligence agent.** Enterprise clients interact via Slack (or Teams) and get data-grounded answers, daily digests, budget recommendations, ROI forecasts, and competitor/sector intelligence — all specific to their business data.
+**Multi-channel AI business intelligence agent.** Ask questions in natural language via Slack or REST API and get data-grounded answers with specific numbers, daily digests, budget recommendations, ROI forecasts, and competitor intelligence.

 ---

 ## Architecture

-Built on the **OpenClaw pattern**:
- **ProtocolAdapter** — normalises messages from any channel (Slack, Teams, etc.)
- **SOUL.md** — per-client personality, rules, and business context
+```
+User (Slack/API) → ProtocolAdapter → Orchestrator → NL-to-SQL → PostgreSQL
+                                              ↓
+                                    Gen Agent (LLM) → QA Agent → Response
+                                              ↑
+                                    RAG Retriever (pgvector)
+                                              ↑
+                                    Scout Agent (web search)
+```
+
+- **Orchestrator** — coordinates the full pipeline with retry logic
+- **Gen Agent** — generates data-grounded responses with specific figures
+- **QA Agent** — validates responses for hallucinations (branch names, numbers)
+- **Scout Agent** — fetches competitor/sector news via Tavily
+- **RAG Retriever** — semantic search over historical business data (pgvector)
+- **SOUL.md** — per-client personality and rules
 - **HEARTBEAT.md** — autonomous daily digest scheduling

-All intelligence lives in the Clawrity backend. OpenClaw layer has zero business logic.
+---

 ## Tech Stack

@@ -19,29 +32,60 @@ All intelligence lives in the Clawrity backend. OpenClaw layer has zero business
 |---|---|
 | Language | Python 3.11 |
 | API Framework | FastAPI + uvicorn |
-| LLM | Groq API — llama-3.3-70b-versatile |
-| Embeddings | sentence-transformers all-MiniLM-L6-v2 (CPU, 384d) |
+| LLM | Groq (llama-3.3-70b-versatile) or NVIDIA NIM |
+| Embeddings | sentence-transformers all-MiniLM-L6-v2 (384d) |
 | Database | PostgreSQL + pgvector |
-| Channel (dev) | Slack Bolt SDK (Socket Mode) |
-| Channel (demo) | Microsoft Teams Bot Framework SDK |
-| Scheduler | APScheduler AsyncIOScheduler |
+| Channel | Slack Bolt SDK (Socket Mode) |
+| Scheduler | APScheduler |
 | Web Search | Tavily API + DuckDuckGo fallback |
 | Forecasting | Prophet |

-## Quick Start
+---

-### 1. Prerequisites
+## Quick Start (From Scratch)
+
+### Prerequisites

 - Python 3.11+
 - Docker & Docker Compose
- Groq API key (free: https://console.groq.com)
- Tavily API key (free: https://app.tavily.com)
+- [Groq API key](https://console.groq.com) (free)
+- [Tavily API key](https://app.tavily.com) (free)

-### 2. Environment Setup
+### 1. Clone & Setup
+
+```bash
+git clone <your-repo-url>
+cd clawrity
+
+# Create virtual environment
+python3 -m venv venv
+source venv/bin/activate   # Linux/Mac
+# venv\Scripts\activate    # Windows
+
+# Install dependencies
+pip install -r requirements.txt
+```
+
+### 2. Configure Environment

 ```bash
 cp .env.example .env
-# Fill in your API keys in .env
+```
+
+Edit `.env` and fill in your keys:
+
+```env
+GROQ_API_KEY=gsk_...              # from console.groq.com
+DATABASE_URL=postgresql://user:pass@localhost:5432/clawrity
+TAVILY_API_KEY=tvly-...           # from app.tavily.com
+
+# Slack (optional — for Slack integration)
+SLACK_BOT_TOKEN=xoxb-...
+SLACK_APP_TOKEN=xapp-...
+SLACK_SIGNING_SECRET=...
+
+# Digest webhook (optional)
+ACME_SLACK_WEBHOOK=https://hooks.slack.com/services/...
 ```

 ### 3. Start PostgreSQL + pgvector
@@ -50,27 +94,26 @@ cp .env.example .env
 docker compose up -d postgres
 ```

-### 4. Install Dependencies
+Wait ~10 seconds for PostgreSQL to initialize, then verify:

 ```bash
-python -m venv venv
-source venv/bin/activate
-pip install -r requirements.txt
+docker compose ps
+# postgres should show "healthy"
 ```

-### 5. Download Kaggle Datasets
+### 4. Download Datasets

-Download these two datasets and place them in `data/raw/`:
+Download these two Kaggle datasets and place the files in `data/raw/`:

 1. **Global Superstore**: https://kaggle.com/datasets/apoorvaappz/global-super-store-dataset
 2. **Marketing Campaign Performance**: https://kaggle.com/datasets/manishabhatt22/marketing-campaign-performance-dataset

 ```bash
 mkdir -p data/raw data/processed
-# Place downloaded files in data/raw/
+# Place Global_Superstore2.csv and marketing_campaign_dataset.csv in data/raw/
 ```

-### 6. Seed Demo Data
+### 5. Seed Demo Data

 ```bash
 python scripts/seed_demo_data.py --client_id acme_corp \
@@ -78,64 +121,86 @@ python scripts/seed_demo_data.py --client_id acme_corp \
  --marketing data/raw/marketing_campaign_dataset.csv
 ```

-### 7. Run RAG Pipeline
+### 6. Run RAG Pipeline

 ```bash
 python scripts/run_rag_pipeline.py --client_id acme_corp
 ```

-### 8. Start the API
+### 7. Start the Server

 ```bash
 uvicorn main:app --reload --port 8000
 ```

+Server runs at `http://localhost:8000`. Health check: `http://localhost:8000/health`
+
+---
+
+## Test the API
+
+```bash
+# Simple question
+curl -X POST http://localhost:8000/chat \
+  -H "Content-Type: application/json" \
+  -d '{"client_id": "acme_corp", "message": "What is the total revenue for the Seattle branch?"}'
+
+# Recommendation question
+curl -X POST http://localhost:8000/chat \
+  -H "Content-Type: application/json" \
+  -d '{"client_id": "acme_corp", "message": "How can we improve revenue for the Seattle branch?"}'
+
+# Trigger digest
+curl -X POST http://localhost:8000/digest \
+  -H "Content-Type: application/json" \
+  -d '{"client_id": "acme_corp"}'
+```
+
 ---

 ## Slack Bot Setup (Socket Mode)

-### Step 1: Create Slack App
+### 1. Create Slack App

 1. Go to https://api.slack.com/apps
 2. Click **Create New App** → **From scratch**
 3. Name it `Clawrity` and select your workspace

-### Step 2: Enable Socket Mode
+### 2. Enable Socket Mode

-1. In the left sidebar, click **Socket Mode**
-2. Toggle **Enable Socket Mode** to ON
-3. Click **Generate Token** — name it `clawrity-socket`
-4. Copy the `xapp-...` token → paste into `.env` as `SLACK_APP_TOKEN`
+1. Left sidebar → **Socket Mode** → Toggle ON
+2. Generate Token → name it `clawrity-socket`
+3. Copy the `xapp-...` token → paste into `.env` as `SLACK_APP_TOKEN`

-### Step 3: Configure Bot Token
+### 3. Configure Bot Permissions

-1. Go to **OAuth & Permissions**
-2. Under **Bot Token Scopes**, add:
+1. **OAuth & Permissions** → **Bot Token Scopes**, add:
   - `app_mentions:read`
   - `chat:write`
   - `channels:history`
   - `channels:read`
-3. Click **Install to Workspace**
-4. Copy the `xoxb-...` token → paste into `.env` as `SLACK_BOT_TOKEN`
+   - `im:history`
+   - `im:read`
+   - `im:write`
+2. Click **Install to Workspace**
+3. Copy the `xoxb-...` token → paste into `.env` as `SLACK_BOT_TOKEN`

-### Step 4: Enable Events
+### 4. Enable Events

-1. Go to **Event Subscriptions**
-2. Toggle **Enable Events** to ON (no Request URL needed in Socket Mode)
-3. Under **Subscribe to bot events**, add:
+1. **Event Subscriptions** → Toggle ON
+2. Under **Subscribe to bot events**, add:
   - `app_mention`
   - `message.channels`
-4. Click **Save Changes**
+   - `message.im`
+3. Click **Save Changes**

-### Step 5: Get Signing Secret
+### 5. Get Signing Secret

-1. Go to **Basic Information**
-2. Under **App Credentials**, copy **Signing Secret**
-3. Paste into `.env` as `SLACK_SIGNING_SECRET`
+1. **Basic Information** → **App Credentials**
+2. Copy **Signing Secret** → paste into `.env` as `SLACK_SIGNING_SECRET`

-### Step 6: Invite Bot to Channel
+### 6. Invite Bot to Channel

-In Slack, go to your desired channel and type:
 ```
 /invite @Clawrity
 ```
@@ -146,19 +211,40 @@ In Slack, go to your desired channel and type:

 | Method | Path | Description |
 |--------|------|-------------|
-| POST | `/chat` | Send message → get AI response |
-| POST | `/slack/events` | Slack webhook fallback |
-| POST | `/compare` | Side-by-side RAG vs no-RAG |
-| POST | `/forecast/run/{client_id}` | Trigger Prophet forecasting |
-| GET | `/forecast/{client_id}/{branch}` | Get cached forecast |
-| GET | `/admin/stats/{client_id}` | RAG monitoring stats |
-| GET | `/health` | System status |
+| `POST` | `/chat` | Send message → get AI response |
+| `POST` | `/compare` | Side-by-side RAG vs no-RAG comparison |
+| `POST` | `/scout` | Targeted competitor/market intelligence search |
+| `POST` | `/scout/digest` | Full scout agent digest for a client |
+| `POST` | `/digest` | Manually trigger daily digest pipeline |
+| `GET` | `/admin/stats/{client_id}` | RAG monitoring stats |
+| `POST` | `/forecast/run/{client_id}` | Trigger Prophet forecasting |
+| `GET` | `/forecast/{client_id}/{branch}` | Get cached forecast |
+| `GET` | `/health` | System health check |
+
+---
+
+## Example Questions to Ask
+
+| Category | Question |
+|----------|----------|
+| Simple data | "What is the total revenue for the Seattle branch?" |
+| Channel analysis | "Show me revenue by channel for Seattle" |
+| Rankings | "What are the top 5 branches by revenue?" |
+| ROI | "What is the ROI for New York City?" |
+| Country drill-down | "Show me total revenue by country for Australia" |
+| Recommendations | "How can we improve revenue for the Seattle branch?" |
+| Strategy | "What strategy would you recommend for the London branch?" |
+| Trends | "What is the revenue trend from 2011 to 2014?" |
+| Channel comparison | "Which channel has the highest ROI overall?" |
+| Bottom performers | "What are the bottom 10 performing branches?" |
+
+---

 ## Adding a New Client

-1. Create `config/clients/client_newclient.yaml` (copy from `client_acme.yaml`)
-2. Create `soul/newclient_soul.md`
-3. Create `heartbeat/newclient_heartbeat.md`
+1. Create `config/clients/client_<name>.yaml` (copy from `client_acme.yaml`)
+2. Create `soul/<name>_soul.md` with personality/rules
+3. Create `heartbeat/<name>_heartbeat.md` with schedule
 4. Place data in `data/raw/` and run seed + RAG scripts
 5. Restart — zero code changes required

@@ -168,46 +254,69 @@ In Slack, go to your desired channel and type:

 ```
 clawrity/
-├── main.py                         # FastAPI application
-├── config/                         # Configuration
-│   ├── settings.py                 # pydantic-settings from .env
-│   ├── client_loader.py            # YAML client config loader
-│   └── clients/client_acme.yaml    # Per-client config
-├── soul/                           # Per-client personality
-│   ├── soul_loader.py
-│   └── acme_soul.md
-├── heartbeat/                      # Autonomous digest scheduling
-│   ├── heartbeat_loader.py
-│   ├── scheduler.py
-│   └── acme_heartbeat.md
-├── agents/                         # AI agents
-│   ├── gen_agent.py                # Response generation
-│   ├── qa_agent.py                 # Quality assurance
-│   ├── orchestrator.py             # Pipeline coordinator
+├── main.py                         # FastAPI application + lifespan
+├── agents/
+│   ├── orchestrator.py             # Pipeline coordinator (retry loop)
+│   ├── gen_agent.py                # LLM response generation
+│   ├── qa_agent.py                 # Hallucination checker
 │   └── scout_agent.py              # Competitor intelligence
-├── skills/                         # Capabilities
-│   ├── postgres_connector.py       # DB connection pool
-│   ├── nl_to_sql.py                # Natural language → SQL
-│   └── web_search.py               # Tavily + DuckDuckGo
-├── channels/                       # Message channels
-│   ├── protocol_adapter.py         # OpenClaw normalisation
+├── config/
+│   ├── settings.py                 # pydantic-settings from .env
+│   ├── llm_client.py               # LLM factory (Groq/NVIDIA) with retry
+│   ├── client_loader.py            # YAML client config loader
+│   └── clients/client_acme.yaml
+├── channels/
+│   ├── protocol_adapter.py         # Message normalisation
 │   ├── slack_handler.py            # Slack Socket Mode
 │   └── teams_handler.py            # Teams stub
-├── rag/                            # Retrieval-augmented generation
-│   ├── preprocessor.py
-│   ├── chunker.py
-│   ├── vector_store.py
-│   ├── retriever.py
-│   ├── evaluator.py
-│   └── monitoring.py
+├── skills/
+│   ├── nl_to_sql.py                # Natural language → SQL
+│   ├── postgres_connector.py       # PostgreSQL + pgvector
+│   └── web_search.py               # Tavily + DuckDuckGo
+├── rag/
+│   ├── preprocessor.py             # Data cleaning
+│   ├── chunker.py                  # Semantic chunking
+│   ├── vector_store.py             # Embed + pgvector store
+│   ├── retriever.py                # Intent-based retrieval
+│   ├── evaluator.py                # RAG quality metrics
+│   └── monitoring.py               # JSONL interaction logging
+├── soul/
+│   ├── soul_loader.py
+│   └── acme_soul.md
+├── heartbeat/
+│   ├── heartbeat_loader.py
+│   ├── scheduler.py                # APScheduler digest jobs
+│   └── acme_heartbeat.md
 ├── forecasting/
-│   └── prophet_engine.py
+│   └── prophet_engine.py           # Prophet time series
 ├── connectors/
 │   ├── base_connector.py
 │   └── csv_connector.py
 ├── etl/
 │   └── normaliser.py
-└── scripts/
-    ├── seed_demo_data.py
-    └── run_rag_pipeline.py
+├── scripts/
+│   ├── seed_demo_data.py           # Seed PostgreSQL from CSV
+│   └── run_rag_pipeline.py         # Preprocess → chunk → embed
+├── docker-compose.yml
+├── Dockerfile
+└── requirements.txt
 ```
+
+---
+
+## Troubleshooting
+
+| Issue | Fix |
+|-------|-----|
+| `Connection refused` on /chat | PostgreSQL not running — `docker compose up -d postgres` |
+| `Rate limited (429)` | LLM API throttling — system auto-retries with backoff |
+| `No module named 'X'` | Activate venv: `source venv/bin/activate` |
+| Slack bot not responding | Check `SLACK_BOT_TOKEN` and `SLACK_APP_TOKEN` in `.env` |
+| `Clawrity digest unavailable` | Set valid `ACME_SLACK_WEBHOOK` in `.env` |
+| Embeddings slow on first run | MiniLM downloads ~80MB on first use — subsequent runs are cached |
+
+---
+
+## License
+
+Private — internal use only.