research and discovery

To ground our design in real user needs, we conducted targeted research with enterprise translation and localisation teams. Methods included:

13 interviews with clients (pharmaceuticals, finance, government)
Internal workshops with linguists and CAT tool users
Competitive analysis: DeepL, Google Cloud STT, Gladia, Speechmatics

INSIGHTS

Key insights from the first round of enterprise interviews included:

importance of metadata

Metadata (e.g., speaker tags, pauses, file duration) was crucial for quality control

Time stamped segments

Post-editors wanted time-stamped segments for better alignment with source audio

POst-edition review tools

Trust hinged on editable, reviewable output — not just final translation

The old post-edition interface lacked straightforward segment editing and linguistic tool integration

Ux strategy

Our design strategy focused on usability, transparency, and post-editability. Our goals included:

Seamless audio upload + STT translation flow for non-technical users
Editable, aligned transcripts with time markers and optional audio playback
Integrated CAT interface for edition and linguistic review
Control and feedback: flag errors, accept suggestions, correct transcriptions

Design Process

Phase 1: Mapping the workflow

To begin, I mapped out the end-to-end speech-to-translation pipeline. The process would start with uploading an audio or video file, followed by automatic speech-to-text transcription. Once transcribed, the text would be machine translated into the target language. From there, users could post-edit both the original transcript and the translated output using a CAT-style interface. The final step would be exporting or publishing the edited content in the desired format.

This flow was intentionally aligned with Systran's existing translation environment, ensuring that speech-originated content could be managed with the same control and granularity as traditional documents.

phase 2: wireframes & concept testing

With the workflow in place, I created wireframes to explore and validate the key components of the UI. These included a drag-and-drop audio upload interface with metadata preview, and a transcript pane that displayed editable, time-stamped text blocks.I also designed timeline-based navigation so users could jump to specific audio moments tied to each transcript segment. To support the post-editing stage, we introduced a familiar CAT panel showing source and target text side by side, complete with segment status and glossary support.

We tested these wireframes with internal linguists and two external enterprise clients, which provided valuable insights that shaped the final prototype.

An early lo-fi iteration

MVP Design Highlights

Usability Testing

In the second round of research, we conducted usability testing with a group of target users to gather feedback on the initial designs. This round was crucial for identifying pain points and refining the user experience before moving to development.

Row Editing & Validation

Users expressed a strong need for the ability to edit rows, validate changes using keyboard arrows, which would speed up their workflow and reduce friction.

File Upload Error Management

Users found that when an upload failed, there was insufficient feedback about the error. They needed clearer signalling to indicate what went wrong, with specific error messages explaining the cause (e.g., file size limits, unsupported formats).

Improved error display - usability testing showed users preferred messages displayed in row expand rather than in a tooltip on hover

Outcome and impact

Enabled a new product vertical for Systran, opening up opportunities with existing and prospective enterprise clients.

Streamlined previously manual workflows, reducing the time required to translate media content by up to 40%.

Enhanced product value for key accounts in national security, e-commerce, and banking, positioning Systran more competitively in a growing market.

Reflections

This project pushed me to design for a complex, cross-domain workflow — where speech-to-text tech, machine translation, and linguistic tooling intersect. Key lessons:

Trust over speed

Enterprise users valued accuracy and control more than fast results. Designing for delayed but high-quality processing meant focusing on confidence, clarity, and editability, not just speed.

Transparency + Control

Users wanted to see how the output was generated and make corrections easily. Exposing system decisions and enabling in-context editing built trust and usability.

Linguist-Centric Design

Professional users have complex, detail-heavy workflows. Instead of simplifying, we focused on making complexity manageable — supporting segments, metadata, and glossary alignment.

Systran: MVP Launch for Speech-to-text Translation Tool

Designing a post-editable workflow from voice to multilingual content

overview

The goal

Timeline

Team

The problem