Dubbing (Studio)

Content is machine translated from English by Phrase Language AI.

Dubbing replaces the original audio in a file with a new version in a different language, while preserving the voice characteristics and emotional tone of the original speakers.

Correct speaker assignment is required for dubbing workflows. If using imported subtitle files to skip automatic audio transcription, ensure all speakers are assigned before proceeding to dubbing.

Use cases

A media company wants to distribute an originally English-language documentary in Spain and Japan. The platform's AI identifies each speaker, transcribes and translates the dialogue and then generates new audio tracks in Spanish and Japanese. The generated voices retain the qualities of the original speakers, creating an authentic viewing experience for the new audiences.

To create dubbing for a file, follow these steps:

From the editor, click the arrow under the Dubbing tab and select Add Dubbing.

The Add Dubbing window opens.
Select the required translated language(s) from the dropdown list.

Note

If a regional language variant (e.g., French (Canada), or Portuguese (Brazil)) is selected, the system automatically falls back to the corresponding base language for dubbing.

The regional language selection is still preserved in the project and metadata.
Optionally, enable Apply pronunciation rules to improve text-to-speech accuracy to select existing pronunciations and related pairs for dubbing workflows.
Click Add Dubbing.

The windows closes and the Dubbing tab indicates when the dubbing is finished.

Click Dub Changes to update the dubbing with any changes made to the text.

Click Manage Voices to select a different voice for the dubbing. For each language, up to 7 recommended voices are automatically suggested based on gender and age matching.

Audio and dubbing controls

Audio controls help make dubbing more expressive and accurate by tagging key vocal nuances that add paralinguistic information. Right-click on a segment to Add Audio Controls in the desired position by selecting one or more of the available options.

Note

Audio controls may not work with all languages or voice models. Testing with the selected voice is recommended to confirm compatibility.

Audio adjustments or redubs after testing do not consume additional minutes.

When a project includes multiple speakers, dubbing settings can be adjusted for each speaker track and for individual segments in the Dubbing tab.

Speaker track controls
- Volume
  
  Click the volume icon next to the desired speaker name in the waveform to adjust the volume slider. The adjustment applies to all segments spoken by that speaker.
Segment controls
- Volume
- Stability
  
  Controls how consistent and predictable the generated voice sounds.
- Similarity
  
  Controls how closely the generated voice matches the selected voice profile.
Select a segment in the timeline to open the Segment Settings panel and adjust dubbing settings as required. Click Save Now at the top right of the Dubbing tab to save the changes.

Re-dubbing is required after adjusting Stability or Similarity. Click Dub changes to apply the updated settings.

Adjust dubbing speed

By default, dubbing is generated at 1× speed, meaning the system determines the most natural speaking pace based on the amount of text in the segment.

The current speed is displayed as a label on each speech bubble in the timeline.

There are two methods to adjust dubbing speed, if required:

Extend or shorten the speech bubbles on the waveform:

Example

If dubbed audio extends beyond the intended scene at 1× speed, drag the end time of the speech bubble. Dubbing will speed up slightly to fit within the segment and the speed label will be updated accordingly.
Edit the text and re-dub:

Example

If audio significantly overflows at 1× speed, the segment may contain too much text. Rewrite the text and dub changes.

Pronunciations

Pronunciations control how specific words or phrases are spoken in dubbed audio. They ensure consistent pronunciation of brand names, technical terms, acronyms, and foreign words.

It is possible to define custom pronunciation pairs and apply them to dubbing workflows during project creation.

For existing projects, pronunciations can also be selected when adding a new dubbing language in the Dubbing tab of the editor.

Pronunciations are automatically shared with all users of the same organization in read-only mode.

Prerequisites

The project's target language supports dubbing.
A dubbing language has been added to the project.

To create a pronunciation, follow these steps:

In the Settings page, select the Pronunciations tab.
Click Create new pronunciation.

The Create Pronunciation window is displayed.
Enter a Pronunciation Name and optional Description.
Select Active to make the pronunciation available for project selection.
Click Save.

The pronunciation is listed in the Pronunciations tab.

Each pronunciation can include multiple pairs for a single language. To add a pronunciation pair, follow these steps:

Select an existing pronunciation and click Create New Pair.
Provide the original Source word or phrase.
Define the desired pronunciation in the Target field.

Use phonetic spelling, syllable breaks, or approximate sounds.
Select the relevant language.
Optionally, click Preview to listen and adjust as required.
Click Save.

The new pair is added to the selected pronunciation.

Existing pronunciations and related pairs can be updated or deleted by selecting Edit next to a pronunciation or a pair.

Re-run dubbing after updating pronunciation pairs; previously rendered audio will not change automatically.

Pronunciation Examples

Brand names
- Apple: ap-pul
- Microsoft: mai-kroh-soft
Technical terms
- GIF: jif
- SQL: ess-cue-el
Foreign words
- Croissant: krwa-san
- Sao Paulo: sow-pow-loo

Voice Cloning

Voice cloning generates a synthetic voice based on recordings of a real speaker. The cloned voice can then be used in AI dubbing, allowing translated audio to preserve the tone and vocal characteristics of the original speaker.

The cloning process uses selected speech ranges from uploaded audio or video samples to train the voice model. After samples are processed, a preview can be generated before the voice is saved.

Voice clones are created and managed in the My voices tab of the Settings page. Once created, the voice becomes available for use in dubbing workflows.

Voice sample requirements

Total selected range duration must be between 15 and 180 seconds
Maximum uploaded file duration: 5 minutes
Maximum sample files: 3
Maximum file size: 30 MB per file
Maximum sample ranges: 50
If voice samples contain multiple speakers, ranges must be used to isolate samples from a single speaker.

To create a voice clone, follow these steps:

In the My voices tab of the Settings page, click Create new voice.

The Create your voice section is displayed.
Upload samples to generate the voice clone.

Uploaded media appears in a waveform player where ranges can be marked.
Use the timeline to define ranges containing the speaker’s voice, then click Done.

Selected ranges are confirmed. Additional files can be uploaded and processed the same way by clicking Add ranges.

Note

Only the selected ranges are used to generate the voice clone.
Select the consent checkbox and Confirm.
Click Next.

The Voice details step is displayed.
Enter a voice Name. Optionally, provide voice Description, Gender and Labels to categorize it.
Click Next.

The Preview step is displayed. The first preview is generated automatically.
Select a Preview language and provide Preview text.
Click Preview voice.

The audio sample is generated. This typically takes several seconds.
Click Save.

The new voice is added to the My voices list.

Existing voice clones can be previewed or removed from the My voices tab.