From ed1b3cf99f2ed9ca4a013f168d47d562f3beb24a Mon Sep 17 00:00:00 2001 From: Manoj HV Date: Sat, 16 Aug 2025 06:48:16 +0530 Subject: [PATCH] added README.md --- README.md | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 84 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 28faba6..591773c 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,85 @@ -# PhonoCoach +## 1️⃣ PhonoCoach Chrome extension with FastAPI backend for real-time pronunciation feedback using phoneme analysis leveraging OpenAI's whisper ASR model. + +--- + + +## 2️⃣ Getting Started + +### 🛠️ Prerequisites + +- **Python 3.12+** +- **pip** +- **Chrome browser** +- **ffmpeg** installed and available in your system PATH. + + +```bash +# a)Clone the Repository +git clone https://github.com/Manoj-HV30/phonocoach.git +cd phonocoach +# b)Create a virtual environment +python3 -m venv venv + +### c)Activate the virtual environment +# Linux/macOS: +source venv/bin/activate +# Windows (PowerShell): +# venv\Scripts\Activate.ps1 + +# d)Install dependencies +pip install -r requirements.txt + +# e)Start the FastAPI backend server +uvicorn backend.server:app --reload +``` +### 🖥️ Load the Chrome Extension Locally + +1. Open Chrome and go to `chrome://extensions/` +2. Enable **Developer mode** (toggle in the top-right) 🛠️ +3. Click **Load unpacked** and select the `frontend` folder inside the cloned repo 📂 +4. The **PhonoCoach** icon should appear in your toolbar 🚀 + +## 3️⃣ PhonoCoach in action +[![PhonoCoach in Action](https://i.postimg.cc/1tjyg0rG/2025-08-16-05-17.png)](https://postimg.cc/VrnxpCTv) +[![PhonoCoach in Action](https://i.postimg.cc/0jvr50kV/2025-08-16-05-38.png)](https://postimg.cc/Vr7zDMbt) +### Using PhonoCoach + +1. Select any text on a webpage. +2. Open the **PhonoCoach** popup. +3. Click **🎙 Record** to start recording your voice. +4. Click **Stop** to upload audio and analyze pronunciation. +5. View similarity score, phoneme-level feedback, and improvement tips. +## 4️⃣ Features ✨ + +- Real-time pronunciation analysis for any selected text on any webpage +- Phoneme-level feedback highlighting correct, incorrect, missing, and extra sounds +- Similarity score to quantify pronunciation accuracy +- improvement tips based on your performance +- Uses OpenAI's Whisper ASR for accurate speech-to-text transcription +- Lightweight FastAPI backend for fast processing + + + +## 5️⃣ Dev Notes + +- Make sure the backend server is running before using the Chrome extension. +- Ensure `ffmpeg` is installed and accessible in your system PATH. ⚡ +- ⚠️ The Chrome extension is loaded locally and is ***NOT YET PUBLISHED*** on the Web Store. +- In `backend/server.py`, the Whisper model `"small"` is loaded by default. Users can change it to other models based on their system’s processing power: + +| Model | Approx. RAM Required | Recommended Use Case | +|---------|------------------|-------------------| +| tiny | ~1 GB | Low-resource machines, faster processing | +| base | ~2 GB | Lightweight, reasonable accuracy | +| small | ~4 GB | Default, good balance of speed and accuracy | +| medium | ~8 GB | Higher accuracy, slower processing | +| large | ~16+ GB | Maximum accuracy, requires powerful CPU/GPU | + +- Users can adjust the model in `server.py` according to their available RAM and processing power. +```python +import whisper + +# Change the model here: +# Options: "tiny", "base", "small", "medium", "large" +model = whisper.load_model("small")