/saas design/
/ux research/
Systran is a leader in AI-driven translation solutions, historically focused on written content. As demand for multilingual voice content surged across industries, Systran explored how to integrate speech-to-text translation into its existing enterprise translation platform.
Allow users to upload audio files, transcribe them, translate the text, and refine it using familiar CAT tools and post-editing workflows. As Product Designer, I led the MVP design of this new feature set — bridging voice input with structured translation environments.
12 weeks (Discovery to MVP-ready prototype)
Myself (Product Designer) 2 PMs, 2 NLP engineers, 3 Front-end devs, 2 QA
Clients increasingly requested speech file translation — especially for marketing collateral, training materials, webinars, and recorded meetings. However:
They expected live translation, which wasn’t feasible yet
They needed high translation quality and linguistic control
Existing workflows were optimized for text, not audio-originating content
The UX challenge was to set the right expectations, deliver transparency, and fit within existing translator tools and behaviors.
To ground our design in real user needs, we conducted targeted research with enterprise translation and localisation teams. Methods included:
Key insights from the first round of enterprise interviews included:
Metadata (e.g., speaker tags, pauses, file duration) was crucial for quality control
Post-editors wanted time-stamped segments for better alignment with source audio
Trust hinged on editable, reviewable output — not just final translation
The old post-edition interface lacked straightforward segment editing and linguistic tool integration
Our design strategy focused on usability, transparency, and post-editability. Our goals included:
To begin, I mapped out the end-to-end speech-to-translation pipeline. The process would start with uploading an audio or video file, followed by automatic speech-to-text transcription. Once transcribed, the text would be machine translated into the target language. From there, users could post-edit both the original transcript and the translated output using a CAT-style interface. The final step would be exporting or publishing the edited content in the desired format.
This flow was intentionally aligned with Systran's existing translation environment, ensuring that speech-originated content could be managed with the same control and granularity as traditional documents.
With the workflow in place, I created wireframes to explore and validate the key components of the UI. These included a drag-and-drop audio upload interface with metadata preview, and a transcript pane that displayed editable, time-stamped text blocks.I also designed timeline-based navigation so users could jump to specific audio moments tied to each transcript segment. To support the post-editing stage, we introduced a familiar CAT panel showing source and target text side by side, complete with segment status and glossary support.
We tested these wireframes with internal linguists and two external enterprise clients, which provided valuable insights that shaped the final prototype.
An early lo-fi iteration
In the second round of research, we conducted usability testing with a group of target users to gather feedback on the initial designs. This round was crucial for identifying pain points and refining the user experience before moving to development.
Users expressed a strong need for the ability to edit rows, validate changes using keyboard arrows, which would speed up their workflow and reduce friction.
Users found that when an upload failed, there was insufficient feedback about the error. They needed clearer signalling to indicate what went wrong, with specific error messages explaining the cause (e.g., file size limits, unsupported formats).
Improved error display - usability testing showed users preferred messages displayed in row expand rather than in a tooltip on hover
Enabled a new product vertical for Systran, opening up opportunities with existing and prospective enterprise clients.
Streamlined previously manual workflows, reducing the time required to translate media content by up to 40%.
Enhanced product value for key accounts in national security, e-commerce, and banking, positioning Systran more competitively in a growing market.
This project pushed me to design for a complex, cross-domain workflow — where speech-to-text tech, machine translation, and linguistic tooling intersect. Key lessons:
Enterprise users valued accuracy and control more than fast results. Designing for delayed but high-quality processing meant focusing on confidence, clarity, and editability, not just speed.
Users wanted to see how the output was generated and make corrections easily. Exposing system decisions and enabling in-context editing built trust and usability.
Professional users have complex, detail-heavy workflows. Instead of simplifying, we focused on making complexity manageable — supporting segments, metadata, and glossary alignment.