1
SyNTHEMA
Speech & Language Technologies
Stato dell’arte da una prospettiva Industriale
Carlo Aliprandi
Synthema srl
Carlo Aliprandi
Company Profile
2
Based in Pisa (Italy), SyNTHEMA is a high-technology SME that was
established in 1993 by computer scientists from the IBM Research Center.
Since then, the company has rapidly evolved, becoming nowadays a
leading provider of Language and Semantic solutions, with state-ofthe-art technologies for applications like Enterprise Search, Audio&Text
Mining, Technology Watch, Competitive Intelligence, Speech Recognition,
Respeaking and Speech Analytics.
Grounding its leadership into a strong IT Research and Development,
SyNTHEMA has pioneered a number of innovative applications and
solutions, adopted on a daily basis by a vast amount of users to perform
productivity tasks in different markets and industries, including Homeland
Security, Intelligence and Law Enforcement, Public Administration and
Government, Healthcare and Media.
Carlo Aliprandi
Structure and activities
3
• 30 People (20 IT, 10 Localisation Services)
Semantic Technology
Translation Technology
Speech Technology
Carlo Aliprandi
Il linguaggio naturale
4
Source Ethnologue
Source Netz-Tipp.De
Source http://www.netz-tipp.de/languages.html
Carlo Aliprandi
Tecnologie del linguaggio, alcuni esempi
5
LINGUAGGIO SCRITTO

Traduzione Automatica

Semantica

Ricerca in linguaggio naturale

Information Retrieval

Question Answering
LINGUAGGIO PARLATO

Speech Recognition – Speech to Text
Respeaking
Trascrizione Automatica
Sottotitolazione Assistita
Comprensione del Parlato

Gestione del dialogo (Avatar,..)




Carlo Aliprandi
Semantica
6
The Italian market offers State of the art for:
• Lemmatisation
• POS Tagging
• MultiWord Detection (MWD)
• Named Entity Recogniiton (NER)
• Parsing (dependency – constituency)
• Word Sense Disambiguation (WSD)
• Sentiment Analysis (SA)
• Semantic Role Labeling (SLR)
Languages:
Carlo Aliprandi
Semantica
7
• è un cool topic?
– Bing Microsoft – Powerset (linguistic processor)
– Google – Applied Semantics (ontology, or knowledge base of
concepts and their relationships, coupled with linguistic processing
engine)
– Google Squared (structures the unstructured data on web pages)
– Hakia (meaning-based search engine, ontology and semantic
lexicon, ontological parser)
– WolphramAlpha
+ computational knowledge engine, distilled and revised knowledge, NL
query, rich visualisation
- Knowledge engineering, language dependent
– IBM Watson (Jeopardy!)
• aspettando la killer app, c’è una domanda latente di
“Semantic Search”
Carlo Aliprandi
Speech Technology
8
The Italian market offers State of the art for:
•
Automatic Speech Recognition
•
Automatic Transcription
•
Dialogue Systems
•
Speech Analytics
Languages:
Carlo Aliprandi
The evolution of Dictation
•
9
1° generation: 1990-2000, Application of ASR products to respeaking
–
Players (technology for CSR):
• IBM ViaVoice, Dragon DNS, L&H Xspeech, Philips FreeSpeech, Kurtzweil, Nuance, Loquendo and
others (>10!!) tools plugged into existing subtitling solutions
–
Technology Benefits:
• Speaker dependent, great accuracy and large accent coverage
• Large Vocabularies available (LVSR)
• Good accuracy up to 95-97%
• Good throughput (up to 170 wpm)
–
Some technology limitations:
• SR mainly designed for dictation
• SR available for ‘general’ domains / main languages
• Partial coverage of specific domains (news, politics, economy, gossip…)
• Problem to deal with Out-of-Vocabulary-Words
• Error correction (live and deferred)
• Improvement of language models
–
But main benefit:
• technology can allow fast training of new (untrained) staff
• technology affordable and costless, no need for huge investments
• Well fitting to pre-recorder and close-to-live programs
–
And
•
•
•
main operating limitations:
Typically support single operator (Respeaker)
The respeaker ‘alone’ has to face a challenging task, with a big cognitive overload
Hardly fitting to Live programs (talk-shows, interviews…)
Carlo Aliprandi
The evolution of Dictation
•
10
2° generation 2000-2010 :
•
Global Players: Nuance DNS, Philips Speechmagic, IBM ViaVoice
–
Technology Benefits:
• Speaked dependent, great accuracy and large accent coverage
• Large Vocabularies available (LVSR)
• Good accuracy up to 97-99%
• Good throughput (up to 170 wpm)
–
Overcomed technology limitations:
• SR mainly designed for dictation
•
•
SR available for ‘general’ domains
Problem to deal with OVW
•
•
Error correction (live and deferred)
Improvement of language models
correction
(lettuce - let’us)
-> Adaptation to different speech (conversational speech)
-> Reduced training time (30’ - > 5’)
-> development of specific topics (news, politics, …
-> preanalysis of similar text/scripts
-> live management (editing+insertion) of OVW
-> live: dual operator systems (respeaker+corrector)
-> respeaked speech and aligned scripts saved: error
improving language models
–
benefits:
• Fitting to ‘major’ Live Programs (News, sport events)
–
And main operating limitations:
• The respeaker has still to face a cognitive overload
• Not completely fitting to specific kind of Live programs (chat magazines, talk-shows, major political
debates..
• Introducing subtitles with some delay (5-7’ acceptable)
Carlo Aliprandi
The present (and future) of Dictation
•
11
3° generation: 2010-2015
–
–
–
Global Player technology for CSR:
• Nuance DNS (and no others !!).
emerging of providers of new professional technology for SR:
• Emerging of new ASR engines for (batch and live) transcription
• Speaker Independent systems (Nuance Dictate, IBM Attila….)
• SR engines for Smartphones and cloud services (Google Speech, Apple, Facebook, …)
new
•
•
•
emerging interest and applications
Audio Alignement and segmentatoin
Audio annotation and indexing for cross-media search
Media Monitoring
Carlo Aliprandi
ASR from an Industry perspective
•
Needs?
–
– ASR has several limitations, because it has been designed for
dictation applications, thus performing too poorly in specific tasks,
like Subtitling.
– language coverage may be limited, as commercial systems have
been developed to target the main language markets (i.e. English,
Spanish, French, German, ..) and they are not available for many
languages and dialects
– domain coverage may be limited, as commercial systems have been
developed to target general and generic topics
• Limitations
– Data: resources (raw data – tagged data – models) to build an ASR
technology are not available for several languages
– Needs are different, from the market perspective
Carlo Aliprandi
SAVAS
13
•
•
Is ASR god enough for an application task like Subtitling?
Is an IT provider (academy or R&D) sufficient to fullfill market needs
(improving operations, new offerings ..)?
•
Reporting is different (vs Respeaking) :
–
–
–
–
•
•
•
Not real time
Typically Verbatim (or close-to)
Different audience
No persistence and visualization boundaries (colors, formatting, audio
descriptors….)
Dictation has proved to be a valid alternative for subtitling, taking over
traditional reporting methods
Traditional reporting methods, like fast keyboarding and stenotyping
early adopted
SAVAS brings together Broadcasters, Subtitling Companies, Universities
and Companies involved in the industries of Media, Accessibility and
LVCSR
Carlo Aliprandi
Speech Recognition
14
• Dictation
– Dictation is the interactive composition of text
– Medical Report, court – parliamentary proceedings
• Transcription
– Transcription is transforming
speech into text (Batch – Online)
• Dialogue
– CRM, device control, navigation, call routing
• Multimedia Mining
– Audio2text ; Text2Audio
Carlo Aliprandi
Thank you
15
– Q&A
Courtesy of
Carlo Aliprandi
Scarica

SyNTHEMA - MediaLab