Publication Title
Respeak. A voice-based, crowd-powered speech transcription system
Title of the conference
Human Factors in Computing Systems
Publication Type
Conference Proceedings
Date of the conference
Year of publication


Speech transcription is an expensive service with high turnaround time for audio files containing languages spoken in developing countries and regional accents of well-represented languages. We present Respeak — a voice-based, crowd-powered system that capitalizes on the strengths of crowd-sourcing and automatic speech recognition (instead of typing) to transcribe such audio files. We created Respeak and optimized its design through a series of cognitive experiments. We deployed it with 25 university students in India who completed 5464 micro-transcription tasks, transcribing 55 minutes of widely-varied audio content, and collectively earning USD 46 as mobile airtime. The Respeak engine aligned the transcript generated by five randomly selected users to transcribe Hindi and Indian English audio files with a word error rate (WER) of 8.6% and 15.2%, respectively. The cost of speech transcription was USD 0.83 per minute with a turnaround time of 39.8 hours, substantially less than industry standards. Using a mixed-methods analysis of cognitive experiments, system performance and qualitative interviews, we evaluate Respeak 's design, user experience, strengths, and weaknesses. Our findings suggest that Respeak improves the quality of speech transcription while enhancing the earning potential of low-income populations in resource-constrained settings.
Submitted by Miguel Llanos on Wed, 01/05/2019 - 04:42