1 SyNTHEMA Speech & Language Technologies Stato dell’arte da una prospettiva Industriale Carlo Aliprandi Synthema srl Carlo Aliprandi Company Profile 2 Based in Pisa (Italy), SyNTHEMA is a high-technology SME that was established in 1993 by computer scientists from the IBM Research Center. Since then, the company has rapidly evolved, becoming nowadays a leading provider of Language and Semantic solutions, with state-ofthe-art technologies for applications like Enterprise Search, Audio&Text Mining, Technology Watch, Competitive Intelligence, Speech Recognition, Respeaking and Speech Analytics. Grounding its leadership into a strong IT Research and Development, SyNTHEMA has pioneered a number of innovative applications and solutions, adopted on a daily basis by a vast amount of users to perform productivity tasks in different markets and industries, including Homeland Security, Intelligence and Law Enforcement, Public Administration and Government, Healthcare and Media. Carlo Aliprandi Structure and activities 3 • 30 People (20 IT, 10 Localisation Services) Semantic Technology Translation Technology Speech Technology Carlo Aliprandi Il linguaggio naturale 4 Source Ethnologue Source Netz-Tipp.De Source http://www.netz-tipp.de/languages.html Carlo Aliprandi Tecnologie del linguaggio, alcuni esempi 5 LINGUAGGIO SCRITTO Traduzione Automatica Semantica Ricerca in linguaggio naturale Information Retrieval Question Answering LINGUAGGIO PARLATO Speech Recognition – Speech to Text Respeaking Trascrizione Automatica Sottotitolazione Assistita Comprensione del Parlato Gestione del dialogo (Avatar,..) Carlo Aliprandi Semantica 6 The Italian market offers State of the art for: • Lemmatisation • POS Tagging • MultiWord Detection (MWD) • Named Entity Recogniiton (NER) • Parsing (dependency – constituency) • Word Sense Disambiguation (WSD) • Sentiment Analysis (SA) • Semantic Role Labeling (SLR) Languages: Carlo Aliprandi Semantica 7 • è un cool topic? – Bing Microsoft – Powerset (linguistic processor) – Google – Applied Semantics (ontology, or knowledge base of concepts and their relationships, coupled with linguistic processing engine) – Google Squared (structures the unstructured data on web pages) – Hakia (meaning-based search engine, ontology and semantic lexicon, ontological parser) – WolphramAlpha + computational knowledge engine, distilled and revised knowledge, NL query, rich visualisation - Knowledge engineering, language dependent – IBM Watson (Jeopardy!) • aspettando la killer app, c’è una domanda latente di “Semantic Search” Carlo Aliprandi Speech Technology 8 The Italian market offers State of the art for: • Automatic Speech Recognition • Automatic Transcription • Dialogue Systems • Speech Analytics Languages: Carlo Aliprandi The evolution of Dictation • 9 1° generation: 1990-2000, Application of ASR products to respeaking – Players (technology for CSR): • IBM ViaVoice, Dragon DNS, L&H Xspeech, Philips FreeSpeech, Kurtzweil, Nuance, Loquendo and others (>10!!) tools plugged into existing subtitling solutions – Technology Benefits: • Speaker dependent, great accuracy and large accent coverage • Large Vocabularies available (LVSR) • Good accuracy up to 95-97% • Good throughput (up to 170 wpm) – Some technology limitations: • SR mainly designed for dictation • SR available for ‘general’ domains / main languages • Partial coverage of specific domains (news, politics, economy, gossip…) • Problem to deal with Out-of-Vocabulary-Words • Error correction (live and deferred) • Improvement of language models – But main benefit: • technology can allow fast training of new (untrained) staff • technology affordable and costless, no need for huge investments • Well fitting to pre-recorder and close-to-live programs – And • • • main operating limitations: Typically support single operator (Respeaker) The respeaker ‘alone’ has to face a challenging task, with a big cognitive overload Hardly fitting to Live programs (talk-shows, interviews…) Carlo Aliprandi The evolution of Dictation • 10 2° generation 2000-2010 : • Global Players: Nuance DNS, Philips Speechmagic, IBM ViaVoice – Technology Benefits: • Speaked dependent, great accuracy and large accent coverage • Large Vocabularies available (LVSR) • Good accuracy up to 97-99% • Good throughput (up to 170 wpm) – Overcomed technology limitations: • SR mainly designed for dictation • • SR available for ‘general’ domains Problem to deal with OVW • • Error correction (live and deferred) Improvement of language models correction (lettuce - let’us) -> Adaptation to different speech (conversational speech) -> Reduced training time (30’ - > 5’) -> development of specific topics (news, politics, … -> preanalysis of similar text/scripts -> live management (editing+insertion) of OVW -> live: dual operator systems (respeaker+corrector) -> respeaked speech and aligned scripts saved: error improving language models – benefits: • Fitting to ‘major’ Live Programs (News, sport events) – And main operating limitations: • The respeaker has still to face a cognitive overload • Not completely fitting to specific kind of Live programs (chat magazines, talk-shows, major political debates.. • Introducing subtitles with some delay (5-7’ acceptable) Carlo Aliprandi The present (and future) of Dictation • 11 3° generation: 2010-2015 – – – Global Player technology for CSR: • Nuance DNS (and no others !!). emerging of providers of new professional technology for SR: • Emerging of new ASR engines for (batch and live) transcription • Speaker Independent systems (Nuance Dictate, IBM Attila….) • SR engines for Smartphones and cloud services (Google Speech, Apple, Facebook, …) new • • • emerging interest and applications Audio Alignement and segmentatoin Audio annotation and indexing for cross-media search Media Monitoring Carlo Aliprandi ASR from an Industry perspective • Needs? – – ASR has several limitations, because it has been designed for dictation applications, thus performing too poorly in specific tasks, like Subtitling. – language coverage may be limited, as commercial systems have been developed to target the main language markets (i.e. English, Spanish, French, German, ..) and they are not available for many languages and dialects – domain coverage may be limited, as commercial systems have been developed to target general and generic topics • Limitations – Data: resources (raw data – tagged data – models) to build an ASR technology are not available for several languages – Needs are different, from the market perspective Carlo Aliprandi SAVAS 13 • • Is ASR god enough for an application task like Subtitling? Is an IT provider (academy or R&D) sufficient to fullfill market needs (improving operations, new offerings ..)? • Reporting is different (vs Respeaking) : – – – – • • • Not real time Typically Verbatim (or close-to) Different audience No persistence and visualization boundaries (colors, formatting, audio descriptors….) Dictation has proved to be a valid alternative for subtitling, taking over traditional reporting methods Traditional reporting methods, like fast keyboarding and stenotyping early adopted SAVAS brings together Broadcasters, Subtitling Companies, Universities and Companies involved in the industries of Media, Accessibility and LVCSR Carlo Aliprandi Speech Recognition 14 • Dictation – Dictation is the interactive composition of text – Medical Report, court – parliamentary proceedings • Transcription – Transcription is transforming speech into text (Batch – Online) • Dialogue – CRM, device control, navigation, call routing • Multimedia Mining – Audio2text ; Text2Audio Carlo Aliprandi Thank you 15 – Q&A Courtesy of Carlo Aliprandi