Automated Evaluations

Content is machine translated from English by Phrase Language AI.

Automated evaluations are provided for every MT model. Click on a model name or the ellipsis in the More column to view them.

Phrase Custom AI offers rich data and advanced visual support designed to provide a deeper understanding of custom NextMT model quality:

The Overview tab provides a summary of the evaluation results, featuring intuitive visualizations and metadata about the MT model.
- The Performance Comparison table compares the performance of generic versus custom NextMT models across four MT quality metrics. The table has two main sections:
  - Baseline Performance
    
    Shows automated MT quality scores for Phrase NextMT and a custom NextMT model without TM leverage.
  - RAG Performance
    
    Shows automated MT quality scores where TM fuzzy matches are leveraged to adapt MT output.
  The Best Engine column highlights the highest-performing model for each metric.
- The Model metadata panel provides essential information about the evaluated custom NextMT model.
The Visualizations tab provides a graphical representation of MT evaluation results through donut charts, offering a breakdown of evaluated translation segments by quality category.
- Select the desired MT quality metric from the dropdown menu at the top to benchmark the custom NextMT model against the generic Phrase NextMT model.
- Hover over each category of the donut chart(s) to view the percentage and number of affected segments for that category.
The Evaluation sample tab presents a segment sample preview from the evaluation set, displaying a list of source segments with relevant baseline and RAG performance scores.

When a segment is selected, the right panel displays:
- Segment-specific scores and quality level indicators for baseline and RAG performance.
- A comparison of the translation output generated by custom and generic NextMT models against the reference translation from the dataset. Select Show differences in the engine output to highlight differences against the reference translation.