1️⃣ PhonoCoach

Chrome extension with FastAPI backend for real-time pronunciation feedback using phoneme analysis leveraging OpenAI's whisper ASR model.


2️⃣ Getting Started

🛠️ Prerequisites

  • Python 3.12+
  • pip
  • Chrome browser
  • ffmpeg installed and available in your system PATH.
# a)Clone the Repository
git clone https://github.com/Manoj-HV30/phonocoach.git
cd phonocoach
# b)Create a virtual environment
python3 -m venv venv

### c)Activate the virtual environment
# Linux/macOS:
source venv/bin/activate
# Windows (PowerShell):
# venv\Scripts\Activate.ps1

# d)Install dependencies
pip install -r requirements.txt

# e)Start the FastAPI backend server
uvicorn backend.server:app --reload

🖥️ Load the Chrome Extension Locally

  1. Open Chrome and go to chrome://extensions/
  2. Enable Developer mode (toggle in the top-right) 🛠️
  3. Click Load unpacked and select the frontend folder inside the cloned repo 📂
  4. The PhonoCoach icon should appear in your toolbar 🚀

3️⃣ PhonoCoach in action

PhonoCoach in Action PhonoCoach in Action

Using PhonoCoach

  1. Select any text on a webpage.
  2. Open the PhonoCoach popup.
  3. Click 🎙 Record to start recording your voice.
  4. Click Stop to upload audio and analyze pronunciation.
  5. View similarity score, phoneme-level feedback, and improvement tips.

4️⃣ Features

  • Real-time pronunciation analysis for any selected text on any webpage
  • Phoneme-level feedback highlighting correct, incorrect, missing, and extra sounds
  • Similarity score to quantify pronunciation accuracy
  • improvement tips based on your performance
  • Uses OpenAI's Whisper ASR for accurate speech-to-text transcription
  • Lightweight FastAPI backend for fast processing

5️⃣ Dev Notes

  • Make sure the backend server is running before using the Chrome extension.
  • Ensure ffmpeg is installed and accessible in your system PATH.
  • ⚠️ The Chrome extension is loaded locally and is NOT YET PUBLISHED on the Web Store.
  • In backend/server.py, the Whisper model "small" is loaded by default. Users can change it to other models based on their systems processing power:
Model Approx. RAM Required Recommended Use Case
tiny ~1 GB Low-resource machines, faster processing
base ~2 GB Lightweight, reasonable accuracy
small ~4 GB Default, good balance of speed and accuracy
medium ~8 GB Higher accuracy, slower processing
large ~16+ GB Maximum accuracy, requires powerful CPU/GPU
  • Users can adjust the model in server.py according to their available RAM and processing power.
import whisper

# Change the model here:
# Options: "tiny", "base", "small", "medium", "large"
model = whisper.load_model("small")
S
Description
Chrome extension with FastAPI backend for real-time pronunciation feedback using phoneme analysis leveraging OpenAI's whisper ASR model.
Readme 145 KiB
Languages
JavaScript 47.6%
Python 26.6%
CSS 19.4%
HTML 6.4%