Rev.ai offers the most accurate speech-to-text (STT) APIs to meet your business and technical needs: asynchronous STT and streaming STT.

Asynchronous STT

Asynchronous STT produces transcripts from pre-recorded audio files. Our features include:

  • Global English coverage - our single English model supports all major English accents.
  • Best in class accuracy - we train our speech models on tens of thousands of hours of human-transcribed audio content. The result? We offer the most accurate speech recognition service on the market that boasts the lowest word error rate (WER) of the competition. We beat Google, Amazon, Microsoft, and Speechmatics in our benchmarking tests.  
  • Timestamps - receive a timestamp for every word that is said.
  • Advanced punctuation, capitalization, and inverse text normalization (ITN) - transcripts are automatically polished so they're readable and easy to understand.
  • Verbatim - capture every word that is said, including 'ums' and 'uhs', or remove these filler words with an optional API toggle.
  • Speaker diarization - recognize multiple speakers and attribute text to each.
  • Speaker channel support - process multi-channel audio on up to eight distinct channels.
  • Custom vocabulary - share unique terms with us so we can capture them. 
  • Deployment options - deploy Rev.ai in the cloud or on-prem. 

Need more technical details? Read the documentation for our async API. 

Live Streaming STT

Live Streaming STT produces transcripts in real-time as people are speaking. Our features include:

  • Global English coverage - our single English model supports all major English accents. 
  • Best in class accuracy - we train our speech models on tens of thousands of hours of human-transcribed audio content. The result? We offer the most accurate speech recognition service on the market that boasts the lowest word error rate (WER) of the competition. We beat Google, Amazon, Microsoft, and Speechmatics in our benchmarking tests.  
  • Advanced punctuation, capitalization, and inverse text normalization (ITN) - transcripts are automatically polished so they're readable and easy to understand. 
  • Timestamps - receive a timestamp for every word that is said.
  • Verbatim - capture every word that is said, including "ums" and "uhs".
  • Filter 600 bad words - don't worry about offending your viewers. Bad words will be replaced with asterisks (ie: s**t). 
  • Custom vocabulary - share unique terms with us so we can capture them. 
  • Deployment options - deploy Rev.ai in the cloud (we are investigating offering on-prem in the future).

Need more technical details? Read the documentation for our streaming API. 

Did this answer your question?