Back to Projects

Audio Transcriber

ROLE

Full-stack Developer

TECHNOLOGIES

JavaScript, Node.js, Express, OpenAI Whisper API, AWS S3

DURATION

08/2024 - 10/2024 (3 months)

GITHUB

View Repository

Brief

A powerful audio transcription application that leverages the OpenAI Whisper API to accurately convert spoken language into text with support for multiple languages and accents.

The Audio Transcriber provides an intuitive interface for uploading audio files, processing them through advanced speech recognition, and delivering highly accurate transcripts with speaker diarization and timestamps.

My Contribution

As the sole developer on this project, I designed and implemented the entire application stack:

Developed a modern web interface with JavaScript and HTML5 that provides an intuitive upload and transcription workflow, with real-time progress indicators and transcript preview.
Built a robust backend API using Node.js and Express that handles file uploads, manages transcription jobs, and delivers formatted results to the client.
Integrated the OpenAI Whisper API to process audio files with high accuracy, including support for multiple languages, speaker identification, and timestamping.
Implemented secure file storage using AWS S3 for both source audio files and generated transcripts, with proper access controls and lifecycle management.

System Architecture

The application follows a cloud-native architecture with several key components:

Frontend: Responsive web interface built with JavaScript and modern CSS
Backend API: Node.js/Express server handling request routing and business logic
Authentication: JWT-based user authentication system
Storage Layer: AWS S3 integration for secure file storage
AI Processing: OpenAI Whisper API integration for speech recognition

Key Features

Multi-language Support

Implemented support for over 20 languages with automatic language detection, allowing users to transcribe content in various languages without manual configuration.

Advanced Transcript Formatting

Created a sophisticated transcript processing system that includes speaker identification, timestamps, and punctuation correction, making the final transcripts highly readable and usable.

Batch Processing

Designed a queue-based processing system that allows users to upload multiple files for transcription in a single operation, with parallel processing for maximum efficiency.

Technical Challenges

Several challenges were addressed during development:

API Rate Limiting: Implemented smart throttling and request batching to optimize usage of the OpenAI API within rate limits while maintaining throughput.
Large File Handling: Developed a chunking system to handle large audio files that exceed API limits, with automatic reassembly of transcripts from multiple segments.
Error Recovery: Created a robust error handling system that can recover from API failures and network issues without losing user data or processing progress.

Takeaways

This project provided valuable experience in working with AI APIs and designing systems that can handle asynchronous processing of large media files. I gained insights into optimizing API usage patterns, managing cloud storage costs, and creating user interfaces that effectively communicate processing status.

The experience with error handling and recovery strategies has been particularly valuable, teaching me practical approaches to building resilient cloud applications that can gracefully handle disruptions in third-party services.