Why file type matters in language translations
Clients sometimes don’t realize that file type matters. For example, requesting the translation of a PDF file differs from translation of a DOCX and translating a JPEG is different from translating a PSD. File types may have their own software, processes, challenges, and sometimes even vendors. In this article, we will categorize file types in a helpful way and describe a few things about each file type that affect how a language service provider (LSP) goes about translating them.
File-type categorization has mostly to do with whether or not it is compatible with a computer-assisted translation (CAT) tool and what other kinds of specialists are required to make the target file publish-ready. CAT tools can accommodate many kinds of file types and the list grows over time. Whether or not the file type is proprietary (requires special software to use) affects how accessible the text is and what additional amount of work is needed to finalize a deliverable. In both categories, how accessible the content that needs to be translated determines the amount of work required to translate it and is highly correlated with how compatible the file type is with at CAT tool.
In this article, we will categorize files in the following way:
- Text files
- Software files
- Image files
- Desktop Publishing files
- Media files
- eLearning files
Some of these categorizations may seem obvious. We start with text files because all the other file types are easier to explain as deviations of text files. By the end, we hope to have successfully demonstrated why file type is one of the largest factors in translation and how it affects process and pricing.
Text files
Text files are files in which text is the main type of content, the text does not need a rigid container (such as a text box) in order to be present, and the files are compatible with CAT tools. Text has attributes like font, font size, font style (bold, underlined, italics, etc.) and other common things found in Microsoft Office or Apple iWork Suite. What is not included in this category is PDF files because they are not natively accessible – text is not editable in PDFs as it is in Microsoft Office or Apple iWork. Translation of text files is straightforward and the simplest because all CAT tools are compatible with all text file formats. Nothing special has to be done to the file to prepare it for a CAT tool. After translation, only a standard quality check is needed to confirm that all content was translated.
Software files
Software files are any kinds of files in which the text is “wrapped” in code. HTML, XML and JS files are examples of software files. Software files can be easily read by CAT tools, but since the text is explicitly separated from other digital assets that accompany the text and provide context, translation may be a bit challenging. Software files usually need reference files to accompany them or the text needs to be provided in another format in order to provide proper context for the translator. Like eLearning files, it’s important to perform LQA and test the final product before release to ensure all characters and words fit on the LCD screen, machine interface, etc.
Image files
Image files are file types that are digital flat images or graphics. JPEG, GIFF, and PNG are examples of image or graphic files. Alone, these file types are read using programs like Windows Photo Viewer, Preview or Adobe PhotoShop. Since image files technical can’t contain text, but rather images of text, they can’t directly be translated. The text must first be converted into a format that is accessible to a CAT tool or rekeyed as text in a word processor. For this reason, we classify PDF files as image files. The text in PDF files needs to be extracted to be properly accessed and translated. After translation, the text may need to be augmented to match the source text style.
Desktop Publishing files
Desktop Publishing files, or DTP files, include almost all Adobe file types, QuarkXPress, Microsoft Publisher and some others. All DTP files emphasize page layout, so text requires text boxes, and no content assets have a default location on the page. DTP files are accessible to CAT tools either directly or with minimal effort. However, after translation, due to the extensive use of text boxes and the fact that text tends to expand or contract when translated into another language, a specialist must now go through each textbox of each page and address the overset text and any other alterations necessary to make the new file functional. Typically, the use of these kind of software require a specialist other than a translator to do this work. This work is called DTP. After DTP, a linguist needs to review the file again to confirm linguistically accurate DTP.
Media files
Media files are any audio, video or “container” files that has both media in them. Common file types include MP3, MP4, MOV, WAV and many others. Animations need to be recaptured as text and audio need to be captured as a transcription in order to be translated. Due to this “conversion” process, media files are not directly compatible with CAT tools. Specialists are needed throughout the process to transcribe with timestamps, provide voice over audio files, and render video. After translation, the subtitles and/or voice over need to be compiled together back into the media file where, like in DTP, a linguist reviews the file again to confirm linguistic accuracy. Media files are some of the most time-consuming projects because they need to be reviewed in-full multiple times at a normal play speed to ensure quality.
eLearning files
eLearning files are essentially a collection of different files that make up an eLearning course, which include files specific to eLearning, like STORY, CPTX or AWT. Having access to the full course build, including all supplementary files is crucial keeping the project cost-effective. Text needs to be extracted from everything, including buttons and quizzes. One of the unique traits of eLearning projects are the high number of audio files that accompany the course material. How the audio files, transcriptions, on-screen text (OST) is organized becomes crucial. Specialist with eLearning software are again need to review and extract what needs to be translated. After translation, transcription, voice over, etc. everything needs to be compiled back into the eLearning course. A final linguistic quality assurance (LQA) is then done to ensure presentability. eLearning projects require an extensive amount of time to complete because of testing. Each module, quiz, permutation that user can experience needs to be experienced in the LQA and testing stage in order to ensure the final course is ready for publication.
Summary
File type matters because it affects the cost and time it takes to render them into a target language. The more work it takes to convert the file into a form that is useable by a CAT tool and the more work it takes to make the file presentable, the more money and time it takes to translate the file. Text files are the simplest files to translate because they are compatible with CAT tools and require no special software or software skills to prepare them for publishing. Media files are the most complex because they can contain text, graphic versions of text, audio and video. As such, they require significant preparation before and after translations. Moreover, just one change requires a re-recording of the entire audio track. eLearning files, which are really a collection of many files, mostly media files, is therefore one of the most complex projects. The next time you need a file translated, consider what file type will give the LSP the least amount of effort to access the text and make it publication quality.
If you’d like to learn more about how BURG Translations helps you ensure high-quality translations, contact us today.