Gemini may be > Whisper
Audio transcription has been getting a lot better recently. In 2022,
OpenAI's Whisper shocked the space with actually accurate transcription.
From there, it's been getting cheaper and cheaper, with distilled
versions like whisper-large-v3-turbo
and batching making it
10,000x cheaper than a human.
But surprisingly, the cheapest transcriber isn't a model designed for that task. It's the multimodal language model Gemini. Let me show you.
Deepinfra runs Whisper Turbo at a price of $0.0002/minute, which equates to the rather respectable price of $0.012/hour.
Finding the Gemini price is not as easy. Let me break it down in a table, using the Flash version (other versions are too expensive or too crappy).
Type | Tokens | Pricing | Cost |
---|---|---|---|
Input | 115,329/hour | $0.075/million (with Flash) | $0.0086/hour |
Output | ~1000/hour | $0.3/million (with Flash) | $0.0003/hour |
Total | - | - | $0.0089/hour |
75% of the price. This could be a real business, especially if you use Gemini's batching feature. If you do make some money from this, let me know, contacts are on the home page.