Home/AI Glossary/Speech-to-Text (STT)

Speech-to-Text (STT)

Speech-to-Text (STT) powers voice memos, enables subtitles, and closes the loop between microphone input and LLM workflows. Today’s systems handle noise and multiple speakers better than older rule-based phonetic approaches.

Explore tools like Descript or ElevenLabs, where some offerings include both STT and TTS. See multimodal for combined text-audio workflows.


Key characteristics

  • Converts speech into text for transcription, searchability, and downstream analysis.
  • Is valuable in meetings, customer support, interviews, and content production with large audio volumes.
  • Accuracy depends on audio quality, language variant, domain terminology, and multi-speaker handling.