In October, Microsoft unveiled a transcription feature — Transcribe in Word — designed to let users leverage the power of the cloud to transcribe audio. After nearly a year in development, Transcribe in Word is now generally available in U.S. English at no cost to existing Microsoft 365 subscribers. It will come to Android and iOS later this year.
You could say Microsoft is late to the party — speech-to-text is hardly novel, after all. But Microsoft project manager Dan Parish says the company is “uniquely positioned” to provide a one-stop shop for transcription. “You don’t have to worry about fussing around with different Windows apps,” he said during a briefing with reporters. “What we’re trying to do with all of our investments in the natural user interface space — whether they have touch or voice, you name it — is enable everyone to work in the way that’s best for them so that they can be more effective, they can spend less time and energy creating the best work, and they can really focus on what matters most.”
Microsoft 365 subscribers using Edge or Chrome will now see a Dictate menu under the Home tab when they create a new Word document from Office.com. Selecting Transcribe will start a recording, which can be paused at any time, while hitting the “Save and transcribe now” button will send the recording to the Azure cloud for transcription. Prerecorded files in .wav, .mp4, .m4a, and .mp3 formats can be uploaded via the new Upload audio option.
Transcripts from recordings and uploaded audio appear in the Transcribe pane once the transcription process is complete — shortcuts let users quickly insert sections or the entire transcript into the Word document. The time it takes to generate a transcript depends on internet speed and audio file size, which is limited to 200MB and five hours per month for recorded audio; uploaded audio is unlimited. (Microsoft says it’s considering adding options to extend the former cap in the future.) Recordings are stored in the Transcribed Files folder on OneDrive, where they can be renamed or deleted, and there’s an editing tool that can be used to change a speaker name for a section, change all occurrences of that speaker label to a name, or fix names and typos.
When asked about the privacy implications of Transcribe in Word, Parish said Microsoft doesn’t retain recordings or transcription results but instead stores them in users’ personal OneDrive folders. Recordings are only sent to the Azure backend to perform transcription, not for any sort of analysis. Parish also claimed the speech recognition models underpinning Transcribe in Word have been trained on a “diverse” data set to ensure they recognize a range of male and female speakers, including those from different ethnic backgrounds.
Beyond transcriptions, Word on the web now recognizes basic voice commands like “start list,” “start numbered list,” “add italics/bold/underline,” and “add ellipses/ampersand/percent sign.” The full list lives in the help panel, where it can be checked without having to click away from the transcription mode.
The commands are courtesy of Dictate, an add-on Microsoft retired last October in favor of a native Office 365 web and mobile integration. Dictate supports 29 spoken languages, real-time translation to 60 languages, and two modes of punctuation. It also recognizes natural language commands like “add dot dot dot” (for ellipses), “pause dictation,” and “add comment,” as well as informal commands like “insert smiley face/heart emoji.” (Microsoft says Hindi, Korean, Russian, Polish, Thai, additional Spanish, and additional Chinese locales are on the way.)