Audio Transcription (Studio)

Content is machine translated from English by Phrase Language AI.

Audio transcription takes audio as input and uses Automated Speech Recognition and Automated Speaker Identification to generate text output. Specifically, the system uses a proprietary instance of OpenAI Whisper Automated Speech Recognition system.

Monolingual term bases can be created in the Settings page to improve AI transcription accuracy for specialized or difficult terms. Term bases are automatically shared with all users of the same organization in read-only mode.

Phrase Studio consumes Video Localization Hours.

Use cases

A 45-minute customer interview recorded as an MP4 file.

A text transcript is generating with speaker identification which can be used to create a case study and pull quotes for a website.

To create an audio transcription project, follow these steps:

From Phrase Studio, click New Project.

The Create new project page opens.
Either drag a file onto the upload field or click Upload file to locate a file on your system.

The uploaded file is displayed.
Optionally, specify the number of Speakers in the uploaded file.
- To set the number of speakers manually, open the dropdown and select a value from 1 to 5. If the file includes more than five speakers, use the default Auto-detect option.
Provide a name for the project and set the project visibility as required:
- New projects are public by default. Public projects are visible to all users in the organization who have access to Studio.
- Deselect Public project to create a private project that is visible only to the project owner. A private project can still be shared with selected users if needed.
Manually select the Source Language or enable Auto-detect source language for automatic detection.
If required, under Localization Options, enable Translate subtitles and select language(s) for the file to be translated into.
- The translation engine is configurable.
- If Dub into target languages is selected, the file will be transcribed, translated and dubbed immediately without the opportunity to check the translation beforehand.
Select a Subtitle profile to determine subtitle display rules.

Enable Use different subtitle profiles for specific languages to select a profile for each language.
Optionally, enable Apply pronunciation rules to improve text-to-speech accuracy to select existing pronunciations and related pairs for dubbing workflows.
If required, configure additional options:
- Open the Subtitles section to import existing subtitle files in SRT or VTT format for both source and target languages.
  
  The system will skip automatic audio transcription with speaker identification and align the existing subtitles with the video. Users need to create and assign speakers manually since SRT/VTT files do not include speaker information.
- Open the Automated translation section to override the account-level settings and select the preferred Translation engine at the project level.
  - If Phrase Language AI is selected, the MT Profile and Translation Memory dropdown menus are displayed.
    
    Select one of the available MT profiles and, optionally, a TM.
  - If AI Translation Agent is selected, the Translation Memory dropdown menu is displayed.
    
    Select one of the available TMs.
- Open the Resources section to select an existing term base or add terms that will be used to detect and match similar-sounding words during transcription.
- Open the AI-generated summaries and insights section to select the desired summaries and insights that will be generated for the uploaded recording, and the relevant AI models.
Click Create project.

The file is uploaded and is displayed on the My Recordings page.

Click on the recording name to open it in the editor and view it in the Transcribe and Translation tabs. Both texts can be edited if required.

Click Download to select the transcription and the translations for download to your system. It is also possible to download audio-only tracks in MP3 format.

AI Summaries

Extracts structured and meaningful insights such as summaries, sentiment, quality flags, or safety issues from subtitles using AI models.

Insights created in the Settings page are automatically shared with all users of the same organization in read-only mode.

Use cases

Summarize customer support calls or identify potentially unsafe or low-quality communication. Phrase Studio returns a summary and flags sections for review.

Speaker Identification

Detects and labels different speakers in an audio file for clearer transcripts and subtitles.

Automatic speaker identification is not available for projects with imported subtitle files.

Use cases

A podcast with multiple participants is processed and each speaker is automatically tagged (e.g., "Speaker 1", "Speaker 2").

Click Manage Speakers under the Transcribe menu to edit the speaker name or add other speakers.

Use the Combined/Speakers toggle at the bottom of the editor to switch between a single waveform and individual waveforms for each speaker. When multiple speakers are detected, segments can be dragged within a row to reflect overlapping speech, or moved to another row to change the assigned speaker.