Subtitles Workgroup: Policy Recommendations and Technical Specs### Executive summary
The Subtitles Workgroup brings together accessibility experts, broadcasters, platform engineers, standards bodies, and representatives of deaf and hard-of-hearing communities to recommend pragmatic, implementable policies and technical specifications that improve subtitle quality, consistency, and accessibility across broadcast, streaming, and online video platforms. This document presents clear policy recommendations, technical specifications, implementation guidance, testing procedures, and a phased rollout plan to help organizations adopt best practices for captions and subtitles.
1. Introduction
Subtitles and captions are essential access technologies that enable people who are deaf or hard-of-hearing, non-native speakers, and viewers in noisy or quiet environments to access audio-visual content. Despite decades of standards work (e.g., CEA-708, EBU-TT, IMSC1, WebVTT), real-world subtitle quality and interoperability remain inconsistent. The Subtitles Workgroup seeks to bridge policy and engineering gaps by issuing recommendations that balance accessibility, technical feasibility, localization, and creative intent.
2. Scope and definitions
- Subtitle: text representing spoken dialogue and relevant non-speech audio, intended for viewers who can hear but may not understand the audio (e.g., different language).
- Caption: text representing spoken dialogue and non-speech audio cues for viewers who are deaf or hard-of-hearing; may include speaker identification, sound effects, and music description.
- Closed captions: captions that can be turned on/off by the viewer.
- Open captions: captions burned into the video image and always visible.
- Intralingual subtitles: same-language subtitles (e.g., for literacy or clarity).
- Interlingual subtitles: translated subtitles in a different language.
- Timed text: any standard format (WebVTT, IMSC1, TTML) that synchronizes text with media.
3. Policy recommendations
3.1 Accessibility-first requirement
- All public video content produced or distributed by the organization must include captions or subtitles. This includes live streams, archived content, promotional clips, trailers, and user-generated content on hosted platforms.
3.2 Compliance and legal alignment
- Align organizational policy with applicable local and international regulations (e.g., ADA, UK Equality Act, EU Audiovisual Media Services Directive) and aim to exceed minimum legal requirements where possible.
3.3 Quality benchmarks
- Set measurable quality benchmarks: 98% word-level accuracy for pre-recorded content; no more than 5% acceptable error rate for critical content like emergency instructions. For live captions, target latency under 3 seconds and a maximum word error rate (WER) of 15% for high-quality human-assisted workflows.
3.4 Language coverage and localization
- Prioritize primary-language captions for markets served; provide interlingual subtitles for major languages representing at least X% of audience (define X per organization). Use professional translators for creative content; machine translation can be used for supplemental languages with human review.
3.5 Metadata and discoverability
- Embed language, role (subtitle vs caption), creator, revision timestamp, and rights metadata in the timed-text files using standard fields (e.g., TTML metadata, WebVTT header comments). Make captions discoverable and indexable for search engines.
3.6 Open standards and formats
- Adopt open, interoperable formats: use IMSC1/TTML for broadcast and archival, WebVTT for HTML5 web delivery, and provide conversion pipelines between formats. Avoid proprietary, non-standard formats that limit reuse.
3.7 Creative integrity and readability
- Preserve speaker intent and tone; annotate music and sound cues where necessary. Use readable fonts, size, contrast ratios, and positioning to avoid covering important onscreen information.
3.8 Live captioning policy
- Require certified live captioners for high-profile events; implement fallback automated captioning with post-event corrections when human captioning is unavailable. Maintain logs of live caption accuracy and incidents.
3.9 User controls and customization
- Provide users options for font size, color, background opacity, and positioning. Store preferences per user account and respect system-level accessibility settings.
3.10 Training and workforce development
- Invest in captioner training, glossaries for proper nouns, and style guides. Offer incentives for translators and caption editors to follow the organization’s style and accuracy goals.
4. Technical specifications
4.1 File formats and containers
- Pre-recorded: primary master captions in IMSC1 (TTML) with timecodes, styling, and metadata. Provide WebVTT for streaming web delivery and SRT for legacy systems. Include sidecar files and burnt-in versions where necessary.
- Live: use real-time captioning protocols such as EBU-TT Live or WebVTT fragments with low-latency delivery (CMAF, Low-Latency HLS/DASH).
4.2 Timing and synchronization
- Minimum display time: 1.2 seconds per caption block; aim for 1.5–3 seconds depending on reading speed complexity. Maximum characters per line: 37–42 for intralingual captions; ideal line length 32–37 characters. Maximum two lines on screen; allow three lines only for complex content with extended display time. Use smoothing algorithms to avoid rapid reflows.
4.3 Styling and readability
- Use sans-serif fonts (e.g., Arial, Helvetica, Roboto). Default font size should be 4–5% of vertical video resolution; allow user scaling. Ensure a minimum contrast ratio of 4.5:1 between text and background; use semi-opaque background boxes with 80–90% opacity for legibility over complex scenes.
4.4 Positioning and safe area
- Default position at lower third; allow dynamic repositioning to avoid covering on-screen text and faces. Respect a 5% safe margin from screen edges for burned-in captions.
4.5 Speaker labeling and non-speech audio
- Use speaker labels where multiple speakers present in quick succession. Mark non-speech audio with concise descriptions in brackets, e.g., [siren approaching], [applause], [music: tense strings].
4.6 Encoding and character support
- Use UTF-8 everywhere. Support full Unicode normalization (NFC) and explicit language tagging per caption block to aid rendering and screen readers.
4.7 Accessibility features beyond text
- Support audio description tracks and separate metadata tracks for object descriptions where necessary. Provide APIs for screen readers and assistive tech to access caption streams.
4.8 Error handling and fallbacks
- If captions are unavailable, display a brief on-screen notification and provide an estimated time for availability. For live streams, switch to automated captions while indicating “Auto-generated” clearly to users.
5. Production workflow
5.1 Authoring
- Use robust captioning tools that export to standard formats and embed metadata. Maintain version control for caption files.
5.2 Review and QC
- Implement multi-stage QC: automated checks (timing, overlaps, encoding), linguistic review (spelling, punctuation), and visual verification (positioning, clipping). Track QC metrics and produce weekly reports.
5.3 Integration with editors and localization platforms
- Integrate captioning with video editing and localization systems via APIs to maintain synchronization with edits and translations.
5.4 Automation and AI assist
- Use ASR for first-pass transcription, then route to human editors for correction. Leverage MT with post-editing for interlingual subtitles; maintain glossaries and translation memories.
6. Testing, metrics, and compliance
6.1 Key metrics
- Word Error Rate (WER), Character Error Rate (CER), latency (seconds), display/readability score (user testing), compliance rate (percentage of catalog with captions), and user satisfaction scores.
6.2 Test suites
- Develop synthetic and real-world test sets covering background noise, overlapping speech, multiple speakers, music, accents, and technical vocabulary.
6.3 User testing
- Run regular sessions with deaf and hard-of-hearing participants to validate readability, timing, speaker labeling, and localization quality.
6.4 Reporting and audits
- Produce quarterly compliance reports and remediate items below thresholds within defined SLAs.
7. Implementation roadmap
Phase 1 (0–3 months): policy adoption, baseline audit, immediate fixes for critical content.
Phase 2 (3–9 months): tooling upgrades, format standardization, pilot live captioning improvements.
Phase 3 (9–18 months): full rollout of user controls, localization pipeline, and QA automation.
Phase 4 (18–24 months): compliance audits, user testing cycles, and continuous improvement.
8. Budgeting and resourcing
Estimate staffing: caption editors, QA testers, localization managers, and engineering support. Budget for licensing or building captioning tools, training, and user testing. Include contingency for live captioning vendor costs.
9. Governance and community engagement
Create a cross-functional steering committee to oversee policy, technical updates, and exception handling. Engage deaf and hard-of-hearing organizations for continuous feedback and to validate outcomes.
10. Example policy snippets (for inclusion in handbooks)
- “All public-facing video assets must have closed captions in the primary language before publishing.”
- “Live events require a certified live captioner; if one is not available, auto-generated captions may be used with prompt post-event correction.”
11. Conclusion
Adopting these policy recommendations and technical specifications will significantly improve accessibility, viewer experience, and legal compliance. The Subtitles Workgroup recommends phased, measurable implementation with active involvement from accessibility stakeholders to ensure real-world effectiveness.
Leave a Reply