Rev AI
FAQ

Best practices and technical information for Rev AI Usage

Updated October 30, 2023 16:00

Where Do I Go to Upload Manually and Monitor Progress?

Once you have registered for Rev AI, you can go to View Account and then recent Jobs (https://www.rev.ai/jobs). There is an UPLOAD button in the upper right hand corner. Click on that and traverse to find your file. Results will appear in the Recent Jobs list (refreshing the browser helps the Completed to appear faster sometimes). You can download in a variety of formats: TXT, JSON for transcriptions and SRT, VTT for captions. Download one at a time, as many as you'd like.

What is Rev AI's Accuracy?

We often get asked questions like - "What's Rev's accuracy?" or "How does Rev AI compare to <enter competitor name here>?" Accuracy is typically measured in Word Error Rate (WER), where the lower the number of errors, or missed words in a speech to text transcription, the higher the accuracy

The short answer is, in our own test sets we get these results:

That said, this is on one particular data set. There are many factors that affect accuracy, including but not limited to:

input audio quality
-background noise
-audio equipment quality
-compression
-sampling rate
-Distant microphone and reverberation
-noisy channels
speaker qualities
-diction, pronunciation, clarity, loudness, etc.
-accent
-dialect
Environmental qualities/Speech Recognition "in the wild"
-Multiple speakers
-Non-stationary noises (e.g. passing sirens)
-Unexpected events
-Many different topic domains (Law, medicine, educational, news, etc.)

ASR engine characteristics
-depth of training data - vocabularies across industries/subjects
-breadth of training data - covering different industries, accents , dialects etc.
-AI model depth and effectiveness
-separate or combined language packs for different dialects (e.g. US vs UK English)

Therefore, we feel uncomfortable making a declaration about our accuracy with a particular number or %, since it is highly dependent on the top 3 bullets above, which are largely out of our control beyond being smart enough to enable our engine to be able to deal with deficiencies in those areas. Because of this, we also don't believe claims that any speech to text company makes about their accuracy unless it's stated in very broad terms. Therefore, we'll focus on the 4th bullet and explain what makes us superior to most any engine available today.

Rev.com has been around for 10 years and offers human transcription. Human transcripts and timing data with the associated audio is exactly what is needed to tune an engine. We have 10 years worth of such data including a multitude of industries (broadcast/media, legal, medical, education etc.) and accents/dialects from all over the world. Therefore, we can handpick the data we need to create the best possible model and have done so with millions of minutes of audio.

In addition, we decided to put all of our English into a single model so that you'll get the best results out of the box, regardless of if it is an American, Brit, Aussie, German or other person speaking English. Many companies make you swap models in and out of memory, depending on who is speaking, which is totally unrealistic in this global world of ours.

We train our models on noisy audio and as a result, we are more resilient to noisy audio than others. ASR is our core business, not one of many things we do, so we are laser focused on continuous improvement in accuracy, performance and features. On high quality audio with native English speakers speaking clearly, we can get in the low-mid 90s percent.

We are the ONLY Speech to Text company that offers end-to-end options, from human to fully automatic transcriptions and captions with a few options in between. This enables us to meet you where you are and provide you APIs for exactly what you need with regards to accuracy, turn-around-time, editing capabilities, required output formats and cost. Tell us the needs in each of these criteria and we'll tell you which combination of Rev products would best meet those needs.

Streaming Best Practices

What is a stream duration?

Stream duration refers to the number of real-world seconds that have passed since the WebSocket connection was established.

What is an audio duration?

Audio duration refers to the number of seconds of audio that have been sent over the WebSocket connection.

What are the default API limits for Web Socket Protocol?

Streaming concurrency limit is 10 streams.
Time limit per stream is 3 hours.

Do we support RTMP (Real Time Messaging Protocol)?

Yes

What are the default API limits for RTMP?

Streaming concurrency limit is 10 streams.
Time limit per stream is 3 hours.

Can we recover from ‘stream longer than 3 hours’ or other timeouts/connection errors?

Yes, we have a code sample using our Node SDK that demonstrates how to recover from connection errors and timeouts here: https://docs.rev.ai/api/streaming/code-samples/#recover-from-connection-errors-and-timeouts-during-a-stream

How many credits are needed for the initial connection?

On initial connection, each streaming session attempts to place a hold on 10 minutes of credits. If the client does not have 10 minutes of credits to hold, the WebSocket returns a 4003 insufficient credits close message. Whenever more than 5 minutes of real-time (stream duration) has passed, Rev AI attempts to place a hold on another 5 minutes of credits. Again, if the client runs out of credits, the WebSocket connection is closed with a 4003 insufficient credits close message. Enabling auto-reload is encouraged to prevent running out of credits mid-stream.

How is streaming billed?

Once the connection is closed, the audio duration and stream duration are finalized. The hold is removed on any unused credits. Any extra credits with a temporary hold will be returned to the customer and available for other transcription jobs. You will be billed for the larger of the two with a minimum of 15 seconds.

Where's my file (or how to download results)?

This section tells you how to download your transcriptions or captions once your file has been processed. If you are interacting with Rev AI using the dashboard user interface by manually uploading a file, you will get the results on that dashboard. It generally takes 15 minutes or less to get results. Often, especially for shorter files, it will be 5 minutes or less.

Important: You may need to refresh your browser to show the COMPLETED status.

When complete, you'll see something like this in the dashboard:

To download the result to your computer, click on the

in the "Export Options" column. You can choose between .txt, .json, .srt and .vtt formats. The last 2 are caption formats.

If you just want the text, choose .txt.

If you will want additional metadata for things like analytics or downstream apps, you may want to use the .json file.

Select the file type you want and it will be downloaded to your computer wherever downloads are set to go on your computer.

How Long Until My Transcript is Available?

Refer to the documentation to learn more about job turnaround time for Rev AI's Asynchronous Speech-to-Text API.

How Can I Improve Accuracy with Customer Vocabulary?

You can improve the accuracy of Rev AI transcripts by submitting a list of custom vocabulary terms. Learn more about the Custom Vocabulary API and related best practices in our documentation.

Does Rev AI Have an On-Premise Solution?

Yes, for offline (asynchronous) transcription. Learn more about the technical requirements for on-premise deployment. For more information or to discuss deploying Rev AI on-premise, email us at support@rev.ai.

How Long Will my Jobs Be Accessible on the Rev AI Server?

Refer to the documentation on job accessibility You can also learn about job auto-deletion and other job deletion options.

How Can I Improve the Audio Quality of my Recordings?

To improve audio quality of recordings, refer to our best practices for recording environments.

Can I Submit Multi-Track Audio Files?

Yes. Refer to the documentation for more information.