Analysis calculates the character and word counts in selected files and identifies average characters per word, repetitions, non-translatables, translation memory matches, internal fuzzy matches and machine translation suggestions. Analysis can also show the number of revisions made by a reviewer.
Analyses can be created by Project managers or Administrators. Linguists cannot be allowed to run their own analyses. Vendors may create analyses for shared jobs/projects.
Some CAT tools refer to analysis as statistics.
Organizational analytics are provided by the analytics dashboard.
Since different billing units are used in different countries, three calculation methods are available:
-
Characters
Without spaces.
-
Words
For languages that use spaces between words—excluding Chinese, Japanese, and Thai.
-
Pages
1800 characters with spaces—unrelated to the actual number of pages in a file.
Due to different counting methods across different languages, word count as presented may not be the same as word counts produced by other applications.
-
Each join tag is replaced with one space.
-
Other tags are removed.
In languages using a whitespace for separating words (e.g., English):
-
Each number sequence including
+-,.
is replaced with one character (using regexp expression[+-]?[0-9]+([., -]?[0-9]++)*+
). -
Each sequence of whitespaces is replaced with one space.
-
Whitespaces at the beginning and the end of segment are removed.
-
Each sequence of characters different from space is counted as one word.
In Languages not using whitespace for separating words (e.g., Japanese):
-
Some punctuation marks are removed from the text (using regexp expression
[\u2000-\u206F\u2E00-\u2E7F\u3000-\u3004\u3006-\u301F\\p{P}]
). -
Segment is split into sequences of characters belonging to non whitespace (NWS) Han, Hiragana, Katakana, and Thai scripts and sequences of characters not belonging to those scripts (WS).
-
Total number of words = (number of words from NWS) + (number of words from WS).
-
Number of words from WS is computed as for English.
-
Number of words for NWS is number of characters without whitespaces.
Note
Characters from CJK languages are counted as both characters and words.
To create an analysis, follow these steps:
-
From a
page, select one or more . -
Click Analyze.
The
window opens. -
Select a
from the dropdown list. -
Provide a name if required.
-
Available macros for Analysis naming:
-
{projectName}
-
{sourceLang}
Adds source language
-
{targetLang}
Adds target language. If multiple languages are analyzed, the language will be empty.
-
{userName}
Adds username of the assigned Linguist or Vendor. If multiple Linguists are assigned, the name will be empty.
-
{workflow}
-
{innerId}
-
{fileName}
If more files/jobs are used for analysis, the
{fileName}
will be empty.
-
-
-
Select analysis options. In particular:
-
Applying the
option will affect the word count as numbers will not be calculated as words. -
The
option compares segments in the analyzed job for similarities within the file as opposed to only comparing them against a TM.If
is checked, internal fuzzies matches are displayed as a separate category in newly created analyses. For example:A translation job with 10 source words includes the following segments, where only the last character differs:
-
I bought a new car.
-
I bought a new car!
In case no matches are found in the TM, a default analysis will display:
IF options
TM category: 0%-49%
TM category: 95%-99%
IF category: 95%-99%
Include IF disabled
10 words
Include IF enabled + Separate IF disabled
5 words
5 words
Include IF + Separate IF enabled
5 words
5 words
-
-
-
Click Analyze.
The analysis, or analyses are added to the list.
-
Click on an analysis in the list to view it in a simple table or download it for rendering in a project management application.
Note
Analysis options can be set when creating an analysis, at the project level, or globally under Settings .
Three analysis types are provided:
Default analysis is the standard analysis run on source segments before translation. It provides the baseline analysis of a job that can be used with the Post-editing Analysis to determine how much effort was put into translating that job. This baseline is also used as the basis for generating quotes for clients.
A breakdown of segment/word/character counts is produced and if used in a project, TM matches are identified along with non-translatables, internal fuzzy matches and QPS (if enabled).
Running a Default Analysis after translation produces incorrect analyses.
Post-editing analysis is run on target segments and indicates editing effort; how much editing the text required from a linguist or proofreader. It is run after post-editing is complete.
When a linguist clicks on an untranslated segment, the current highest translation memory match, machine translation suggestion, and/or non-translatable is saved for that segment and is used in post-editing analysis.
Post-editing analysis can be launched from any workflow step and is calculated as the difference between the text inserted from available source (e.g. TM/MT) and the post-edited result in the segment target.
Post-editing analysis extends the traditional translation memory analysis to include machine translation (MT) and non-translatables (NT). Third-party MT engines are also supported.
Disabling
and does not exclude TM/MT matches from the analysis. In this case, the analysis considers the score of the higher available match instead of the post-editing effort.Post-analysis options
Post-editing options are used for calculating the post-editing effort required for matches from the translation memory (TM), non-translatables (NT) and machine translation (MT).
Analyze TM post-editing enabled
-
Intended for low-quality TMs that contain high percent matches that require Linguist editing.
-
Indicates post-editing effort for the TM.
-
Contains only 100% matches in the analysis. In-context 101% matches from the TM have no effect on the calculation.
Analyze TM post-editing disabled
-
Intended for high-quality TM where matches should be edited as little as possible to reduce cost.
-
Indicates both 101% and 100%.
-
Indicates TM matches offered to the Linguist when the segment is opened (not the actual Linguist's post-editing effort).
-
Indicates post-editing effort for machine translation and non-translatables.
Analyze NT/MT post-editing enabled
-
If the MT or NT suggestion was accepted without further editing it is presented as a 100% match in the analysis.
-
If Linguist changes the MT, the match rate will be lower. The score-counting algorithm is the same as that used to calculate the score of translation memory fuzzy matches.
-
Editing of an NT will cause the segment to be presented as 0-49% NT.
Analyze NT/MT post-editing disabled
-
Entries from MT/NT without any estimated score will be considered TM 0%-49% matches. They will be indicated as translated by the Linguist with the MT not considered.
-
QPS and Phrase Language AI matches higher than 75% will be in the MT column in their respective matches.
-
Indicates NT/MT matches offered to the Linguist when the segment is opened (not the actual Linguist's post-editing effort).
Automatically generate post-editing analysis before a source update
-
Analysis is created:
-
For each updated job.
-
For each individual provider individually and assigned to that respective provider.
-
-
Analysis is not created if:
-
No linguist or vendor is assigned.
-
-
Analysis counts confirmed and translated segments.
-
Analysis follows the naming convention:
-
UpdateSource #{innerID}{workflow}
-
-
Analysis will be created with Units counted (source), Analyze NT post-editing, Analyze TM post-editing and Analyze MT post-editing selected.
Count units of the
-
Select which word count will be presented in the analysis. A target word count may be higher than a source word count.
Does not affect match scoring.
-
Team, Ultimate and Enterprise plans (Legacy)
Get in touch with Sales for licensing questions.
The Compare analysis feature is only available in projects with workflow steps. It compares two versions of a file in different Workflow steps on a segment level and analyzes how the two versions differ. If there are no project specific settings for the analysis, default settings are used and may result in incorrect reports.
Example
Analysis can be run on multiple jobs and can be grouped in two ways:
-
Analyze by provider
-
For a project with many jobs assigned to various Linguists or Vendors. Used to:
-
Create separate analyses containing files assigned to individual Linguists or Vendors.
-
Assign analyses to a provider making the analyses visible to their Linguists/Vendors.
Net rate scheme will be pre-selected as an option if one is applied to the provider.
-
-
-
Analyze by language
-
If a project contains multiple target languages, the analyses of all files can be run in a batch creating a separate analysis for each individual language.
To analyze by language, follow these steps:
-
If the source file used for an analysis is updated, it is indicated as being outdated in the analysis table.
Recalculating applies settings used for the original analysis.
Vendors are not allowed to recalculate analyses created by Buyers.
To recalculate using new source file, follow these steps:
Customize the Analysis view
, , , , and columns can be displayed/hidden in the Analysis table. The column is also available for post-editing analysis and indicates how many seconds were spent editing a segment.
Download an analysis
To download an analysis, follow these steps:
-
Click Download to present the dropdown menu and select:
-
CSV (Comma Separated Values) with or without characters and readable with spreadsheet applications.
-
LOG (Similar to SDL Trados format) and readable with most project management applications.
-
JSON (JavaScript Object Notation), a lightweight data-interchange format.
Only analysis downloaded in JSON format will include a breakdown of NT, MT, TM and internal fuzzies (IF) data per match type.
-
-
Selecting a file type triggers the download.
These files can be imported into most project management software systems.
Apply a net rate scheme
A discount to words/characters/pages can be applied in an analysis. A discounted translation volume is immediately calculated and displayed directly in the analysis in the
row.To remove the net rate scheme from the analysis, leave the field next to the
button empty.When a net rate scheme is applied to the analysis, the downloaded file with the analysis shows weighted word counts in each match category.
To assign an analysis to to a provider, follow these steps:
-
Select an analysis from the list and click Edit.
The editing page opens.
-
Select a Provider from the dropdown list.
-
Click Save.
The analysis will be available to the assigned provider on the linguist portal.