Problem

Clients across industries including banking, pharmaceuticals, and e-commerce were increasingly seeking translation tools for spoken content—both audio and video. Previously, Systran had no built-in solution for this, meaning teams had to rely on third-party tools and manual workarounds, resulting in inefficiencies and data fragmentation.

Solution

The solution had to incorporate media management capabilities for handling and processing various file types, as well as integrate with intuitive CAT (Computer-Assisted Translation) tools and post-edition for professional translators. Additionally, the feature needed to support various export options for users to output translations in different formats.

User Interviews

In the initial phase of user research, I worked closely with the product team to conduct user interviews with several translation teams from our B2B clients across industries such as e-commerce, banking, pharmaceuticals and national security. The goal was to gather insights into their unique needs, workflows, and pain points surrounding speech-to-text translation.

Through these interviews, we explored how translation teams currently handle audio and video files, what challenges they face with existing tools, and what features would make their processes more efficient. We uncovered that teams were often struggling with manual transcription for audio and video content, leading to delays and inconsistent results. Translators also expressed a need for more accurate speech recognition in different accents and languages, as well as seamless integration with CAT tools. Facilitating the use of our advanced linguistic tools like User Dictionaries and Neural Fuzzy Adaptation within the speech-to-text workflow was also key.

The old post-edition interface lacked straightforward segment editing and linguistic tool integration

Design Process

The design process for the speech-to-text translation feature was iterative, progressing from low to high-fidelity designs while incorporating continuous feedback from both users and the development team.

I began by sketching out initial concepts and low-fidelity wireframes, focusing on the core functionality and user flows, such as how users would upload audio/video files, initiate the translation process, and manage their media. These early designs prioritised simplicity and clarity, ensuring that the interface was intuitive and easy to navigate.

Next, I moved to high-fidelity mockups, where I refined the visual design, incorporated branding elements, and worked to ensure a cohesive user experience across various screens. During this stage, myself and two product managers also began collaborating closely with the developers through tech framing sessions to discuss the feasibility of specific features. Their feedback on performance, integration, and technical constraints helped refine the designs, ensuring they were not only user-friendly but also technically viable.

An early lo-fi iteration

Technical Limitations

Due to resource and time limitations, the new speech-to-text feature would share the same base component as our pre-existing text file translation feature. This led to tight constraints specifically when designing the post-edition mechanism since the layout for post-editing video files had to keep the same page layout as text files. However, it meant we were able to bring much needed improvements to the post-editor component for all file types.

Usability Testing

In the second round of research, we conducted usability testing with a group of target users to gather feedback on the initial designs. This round was crucial for identifying pain points and refining the user experience before moving to development. We observed how users interacted with the feature, paying close attention to their workflows and reactions to the proposed interface.

Row Editing & Validation

Users expressed a strong need for the ability to edit rows, validate changes using keyboard arrows, which would speed up their workflow. This was especially important for translation teams who needed to quickly navigate through and edit multiple lines of translated text without constantly relying on the mouse.

File Upload Error Management

We also identified an issue with the file upload process. Users found that when an upload failed, there was insufficient feedback about the error. They needed clearer signalling to indicate what went wrong, with specific error messages explaining the cause (e.g., file size limits, unsupported formats). This would help users resolve issues faster and with less frustration.

Improved error display - usability testing showed users preferred messages displayed in row expand rather than in a tooltip on hover

Reflections

Iterative Design

Continuous iteration and feedback loops are vital to refining features. The ability to test low-fidelity designs early, followed by usability testing on higher-fidelity prototypes, allowed us to address key pain points, such as row editing and error feedback, before moving into development.

Prioritising Efficiency

Based on feedback, features like keyboard navigation for editing rows were prioritized to optimize workflow efficiency. Understanding how users interact with the tool in detail informed these adjustments to make their tasks quicker and more intuitive.

Managing Feature Scope

While we received requests for live translation during video calls, we learned the importance of setting realistic expectations with users when resources are limited, and ensuring we focus on delivering the most impactful features within our current capacity.

Speech-to-text Translation Tool

Building Intuitive Workflows for Complex Media Translation Tasks.

Role

Company

Problem

Solution

Results

User Interviews

Design Process

Technical Limitations

Key Features

In-line edition

Retranslate source segment

Synchronised media player