Phrase Data is available in two tiers:
-
Basic
-
Team, Professional, Business and Enterprise plans
Get in touch with Sales for licensing questions.
-
-
Premium
The Premium tier offers the same access as the Basic tier, plus access to segment-level data.
Cloud data warehouses (such as Snowflake) enable customers to securely access their data through an SQL interface.
Phrase Data displays data relevant to the customer's usage of TMS from the date the customer first subscribed to Phrase to the present date. Phrase reserves the right to change the period for which the data is displayed, upon notice to the customer.
Full technical documentation of the Phrase Data integration.
Phrase Data aids strategic decision-making, demonstrates business impact and can justify investments. Data can be combined with broader company metrics to reveal impact on revenue, market penetration, and customer satisfaction:
-
Web traffic, such as page views, bounce rates, user demographics.
-
Marketing data, such as click-through rates, conversion rates, social media engagement
-
Customer support data, analyze whether localized support materials lead to faster issue resolution and higher customer satisfaction.
-
Sales figures, comparing the costs with revenue generated from localized content.
Insights help to understand the reach and reputation of global content globally and ensures a company’s message succeeds across languages.
-
Return on Investment (ROI) of Localization Efforts
Used to support maintaining or increasing budgets by proving that investments in translation drives returns. It also points out cost savings such as how using machine translation saved money over time contributes to overall ROI.
-
Content Adaptation, Efficacy, and User Engagement
Understand how localized content performs with audiences. Tracking customer retention in each locale suggests that the translations are meeting user needs. This feedback can influence a team’s quality strategy or how to adjust style to better resonate with local users. A customer use case correlates content adaptations with engagement metrics. By using different prompts with AutoAdapt to personalize content based on demographics (e.g. age, gender, region, language), this would then track performance uplift at a local or campaign level.
-
Market and Language Prioritization
Helps decide which languages or regions to focus on for maximum impact. It ensures the team allocates resources to translations that yield the greatest business return. May also guide the sequence of new language rollouts or justify localization for markets where they want to grow.
-
Process Optimization & Technology Strategy
Evaluating workflow duration and automation impact leads to continuous process improvements and helping choose the right technology. Measuring how increased use of machine translation in terms of quality and speed (e.g. post-editing time and quality scores) quantifies productivity gains.
-
Internal Performance Benchmarking
Over time, internal benchmarks such as average cost per word, average turnaround for a given content type, and average quality score become strategic targets for improvement. It reveals the results of smart practices and efficiency gains which further justify localization programs.
In any industry that relies on translation and linguistic localization, teams need robust data insights to guide decisions. However, analytics reporting and insights differ by team needs (Localization, Content, Executive Stakeholder) who require operational metrics (day-to-day efficiency, throughput, costs) and strategic insights (long-term ROI, user impact, market growth).
Operational analytics help localization managers streamline workflows and manage costs on a daily basis.
-
Volume (also known as throughput)
Tracking the amount of content translated over time (e.g. words per week/month) indicates the team’s capacity. This helps with resource planning.
-
Timeliness (also known as turnaround time)
Turnaround time measures how long it takes to translate content from start to finish. Localization teams track whether translations are delivered on schedule or face delays. This is crucial for meeting product launch dates, SLAs, and investigating delays.
-
Vendor and linguist performance
If external translation vendors or freelance linguists are used, the localization team will want to evaluate their performance. Metrics like turnaround time per vendor, on-time delivery per vendor and quality scores per linguist are tracked.
-
Quality metrics (Linguistic Quality Assurance)
Measure the percentage of translations that pass QA checks on the first attempt (no rework needed), indicating effective initial translation. Similarly, detailing the category and severity of the error issues found e.g. terminology, accuracy, etc.
-
Cost and efficiency metrics
Typically teams track cost per word and total spend by project, language, or department to ensure budgets are met. As well as savings to compare raw translation cost vs. discounted cost after applying a net rate scheme highlighting the value of maintaining TM matches and repetitions. For example, if 30% of words in a new project were translated via 100% TM matches, the team can quantify the cost saved by not retranslating those segments.
-
Leverage of Automation (MT and TM)
-
Post-editing effort, such as the average edits or time needed on MT outputs, to help evaluate MT.
-
TM leverage rate by looking at the % of content by TM matches, which indicates TM reuse efficiency and cost savings.
-
MT usage rate such as the % of segments initially translated by MT. This presents automation coverage and opportunities for cost/time reduction – Premium only.
-
Translation Memory (TM) utilization
Track when a TM was last used and how many segments were reused over time. This helps assess TM freshness, value contribution, and whether outdated TMs are still worth maintaining.
Suggested analysis:
-
Use
and filter ontranslation_origin = TM
.-
A TM can be identified using the
translation_memory_id
. -
Aggregate segments by TM, and use the
date_created
field to see when the TM was used. It is also good to calculate and see the averageediting_time_ms
andscore
.
-
-
Join with
using thejob_id
to extract language information e.g.target_locale
. -
Join with
2 using theproject_id
to contextualize usage by domain, client, business unit, etc.
Benefits:
-
Cost savings: Helps sunset unused or low-performing TMs to reduce maintenance overhead.
-
Quality: Identifies aging TMs with high post-editing effort, and flags suitable action.
-
Efficiency: Use high-performing TMs that reduce edit time and increase auto-confirmation. Look into adjusting the TM threshold accordingly.
Machine Translation (MT) engine optimization
Compare MT engines by QPS (quality score per segment) and post-editing time, allowing for a more intelligent routing of content to the right MT engine.
Suggested analysis:
-
Use
with filter ontranslation_origin = MT
. -
Join with
using themachine_translate_setting_id
.-
An MT engine can be identified using the
machine_translate_setting_name
. -
Aggregate segments by MT engine and calculate the sum of
segment_id
, the average QPS, and the averageediting_time_ms
.
-
-
Join with
using thejob_id
to extract language information e.g.target_locale
. -
Join with
for different content types like domain, client, business unit, etc.
Benefits:
-
Lower Cost: Identify and use the right engines for the job that are most suitable for content type and language with historically low edit rates.
-
Quantify turnaround time: By looking into the editing times per language pair, content type, etc.
Maximize auto-confirm segments (i.e. no touch content)
Identify the most efficient point where high QPS and no editing time is eligible for auto-confirmation so to separate as much content as possible from requiring human review.
Suggested analysis:
-
Use
:-
Filter where segment
translation_origin = MT
,is_confirmed = true
,editing_time_ms = 0
, andconfirmation_source = MT
. -
Aggregate segment count (
segment_id
),word count (words_processed
) byQPS
. -
Join with
using thejob_id
to extract language information e.g.target_locale
. -
Join with
using theproject_id
to extract content metadata.
-
Benefits:
-
Cost Savings: Find the optimal QPS threshold to increase auto-confirmation and reduce editing time where the quality is already good enough.
-
Reduce turnaround time: Avoid content being unnecessarily reviewed.
Query templates
Sample queries to use as a guide to help get started with the relevant tables:
SELECT ssv.qps ,DATE(DATE_TRUNC('month', ssv.date_created)) AS date_month ,COUNT(DISTINCT segment_id) AS mt_segments FROM segment_statistic_v2 ssv WHERE ssv.date_created >= DATEADD('day', -365, CURRENT_DATE) AND ssv.date_created < CURRENT_DATE AND ssv.translation_origin = 'mt' -- segments pre-translated by MT GROUP BY ssv.qps ,DATE(DATE_TRUNC('month', ssv.date_created));
SELECT ssv.qps ,pcmv.custom_field_name ,pcmv.custom_field_values ,COUNT(DISTINCT segment_id) AS mt_segments FROM segment_statistic_v2 ssv JOIN job_v2 jv ON ssv.job_id = jv.job_id JOIN project_custom_metadata_v2 pcmv ON jv.project_id = pcmv.project_id WHERE ssv.date_created >= DATEADD('day', -28, CURRENT_DATE) AND ssv.date_created < CURRENT_DATE AND ssv.translation_origin = 'mt' -- segments pre-translated by MT GROUP BY ssv.qps ,pcmv.custom_field_name ,pcmv.custom_field_values;
SELECT ssv.qps ,jv.locale_pair ,COUNT(DISTINCT segment_id) AS mt_segments FROM segment_statistic_v2 ssv JOIN job_v2 jv ON ssv.job_id = jv.job_id WHERE ssv.date_created >= DATEADD('day', -28, CURRENT_DATE) AND ssv.date_created < CURRENT_DATE AND ssv.translation_origin = 'mt' -- segments pre-translated by MT GROUP BY ssv.qps ,jv.locale_pair;
SELECT ssv.qps ,mtsv.machine_translate_setting_type ,COUNT(DISTINCT segment_id) AS mt_segments FROM segment_statistic_v2 ssv JOIN machine_translate_setting_v2 mtsv ON ssv.machine_translate_setting_id = mtsv.machine_translate_setting_id WHERE ssv.date_created >= DATEADD('day', -28, CURRENT_DATE) AND ssv.date_created < CURRENT_DATE AND ssv.translation_origin = 'mt' -- segments pre-translated by MT GROUP BY ssv.qps ,mtsv.machine_translate_setting_type;
SELECT qps ,COUNT(DISTINCT segment_id) AS mt_segments ,COUNT(DISTINCT IFF(LOWER(confirmation_source) IN ('mt', 'tm', 'nt', 'ir', 'ut'), segment_id, NULL)) AS mt_segments_autoconfirmed FROM segment_statistic_v2 WHERE date_created >= DATEADD('day', -28, CURRENT_DATE) AND date_created < CURRENT_DATE AND translation_origin = 'mt' -- segments pre-translated by MT GROUP BY qps;
SELECT qps ,COUNT(DISTINCT segment_id) AS mt_segments ,COUNT(DISTINCT IFF(COALESCE(editing_time_ms, 0) > 0, segment_id, NULL)) AS mt_segments_edited FROM segment_statistic_v2 WHERE date_created >= DATEADD('day', -28, CURRENT_DATE) AND date_created < CURRENT_DATE AND translation_origin = 'mt' -- segments pre-translated by MT GROUP BY Qps;
WITH qps_segments AS ( SELECT ssv.qps ,SUM(words_processed) AS words_processed FROM segment_statistic_v2 ssv WHERE ssv.date_created >= DATEADD('day', -28, CURRENT_DATE) AND ssv.date_created < CURRENT_DATE AND ssv.translation_origin = 'mt' -- segments pre-translated by MT GROUP BY ssv.qps ) SELECT qps AS new_qps_threshold ,SUM(words_processed) OVER (ORDER BY qps DESC) AS words_saved FROM qps_segments;