Università degli Studi di Firenze
Dipartimento di Meccanica e Tecnologie Industriali
On TFSR (semi)automatic systems
supportability: novel instruments for
analysis and compensation
Francesco Borchi, Monica Carfagni, Matteo Nunziati
Dipartimento di Meccanica e Tecnologie Industriali
Outline
•
•
•
•
•
•
•
•
•
Main goal
TFSR Systems
LogR estimation
Common test procedures for TFSR systems
System behaviour classification
Supportability evaluation tools
Score compensation tools
Quality assessment logics
Conclusion
2
Dipartimento di Meccanica e Tecnologie Industriali
Main goal
Our goal is to propose a general purpose set of tools for system
compensation and quality assessment
Specific goals:
1.
to build a generic framework for system analysis
2.
to develop a novel generic tool for system compensation
3.
to assess system quality level on the basis of the amount of
compensation required by the system itself
3
Dipartimento di Meccanica e Tecnologie Industriali
TFSR Systems
Voice sample 1
TFSR system
LogR
Voice sample 2
We define a TFSR system as a black box which receives two or more
recordings as inputs and produces one or more scores (LogR) as outputs
4
Dipartimento di Meccanica e Tecnologie Industriali
LogR estimation 1/2
LogR = log10[P(E | H0) / P( E | H1)]
Log-likelihood ratio
defines the most
supportable hypotesis
Hypotesis 0: the two
samples belong to the
same speaker
Hypotesis 1: the two
samples belong to
different speakers
•If LogR>0 support goes to the H0 hypotesis
•If LogR<0 support goes to the H1 hypotesis
•If LogR=0 no support is provided
5
Dipartimento di Meccanica e Tecnologie Industriali
LogR estimation 2/2
The real LogR value is unknown. We can estimate it using some
approximations. Our systems are error-prone.
The system goodness depends on a number of factors:
• The way we have used to retrieve voice samples
• The kind of parameters employed in the recognition
• The algorithms used for parameter extraction
• The mathematic model used to estimate LogR
Experimentation is the best way to assess system behaviour
6
Dipartimento di Meccanica e Tecnologie Industriali
Common test procedures for TFSR systems
1/2
The system is tested against a set of recordings having known origin:
…
Speaker1
2 or more recordings
…
SpeakerN
…
7
Dipartimento di Meccanica e Tecnologie Industriali
Common test procedures for TFSR systems
2/2
Recordings are mixed up and grouped in pairs:
Same speaker pairs (SS)
Different speaker pairs (DS)
•SS: test system behaviour when H0 is true. Is LogR>0?
•DS: test system behaviour when H1 is true. Is LogR<0?
8
Dipartimento di Meccanica e Tecnologie Industriali
System behaviour classification
1/3
Tippett Plot: a common method to show system behaviour
% SS
False negatives
% DS
H1
H0
False positives
9
Dipartimento di Meccanica e Tecnologie Industriali
System behaviour classification
2/3
Only false scores
Provide a solution to eliminate “false score only” areas (red boxes)
Wrong support
10
Dipartimento di Meccanica e Tecnologie Industriali
System behaviour classification
3/3
•isoperforming
Provide a solution to reduce the amount of false scores
•ipoperforming
11
Dipartimento di Meccanica e Tecnologie Industriali
Supportability evaluation tools 1/3
A quantitative evaluation of false scores has been proposed by
P. Rose et Al. (2003):
LRtest=P(LogR>0 | H0) / P(LogR>0 | H1)
Percentage of true positives
Percentage of false positives
•Interpretable via Evett Table
•No information is provided about false negatives
•No information about the distribution of false scores
Do they affect a narrow range of scores? Do they widely perturb the system
response?
12
Dipartimento di Meccanica e Tecnologie Industriali
Supportability evaluation tools 2/3
We propose to generalize the LRtest index using a new tool: the
“Supportability of System” function (SoS):
SoS(x) = P(LogR>x | H0) / P(LogR>x | H1) if x>0
We know
how
much
we can
rely /on
our system,
timeif by
SoS(x)
= [1P(LogR>x
| H1)]
[1-P(LogR>x
| H0)]
x<0time!
•Interpretable via Evett Table
•Defined for both false positives and negatives
•Univocally detects the amount of false scores for each LogR
•Provides the accuracy of each score
13
Dipartimento di Meccanica e Tecnologie Industriali
Supportability evaluation tools 3/3
20% false
SoS=90/20=4.5
LogR = -13
90% true
14
Dipartimento di Meccanica e Tecnologie Industriali
Score compensation tools 1/3
original
0
X
Preliminary operation:
Eliminate “false score only” areas encreasing or reducing all scores
translated
DX
15
Dipartimento di Meccanica e Tecnologie Industriali
Score compensation tools 2/3
New LogR = LogR*tanh( Log10(SoS) )
LogR=4
LogR=3
LogR=2
LogR=1
16
Dipartimento di Meccanica e Tecnologie Industriali
Score compensation tools 3/3
Compress all scores by a value defined by the SoS function
compressed
Reduce the amount of false scores at the cost of a lower discriminative power
original
Decreased
values of
forfalse
true scores
scores
Reduced amount
17
Dipartimento di Meccanica e Tecnologie Industriali
Quality assessment logics 1/3
– Score compensation reduces system’s discriminative power
+ Score compensation is required to prevent unbalanced
responses
•Compensation increases for decreasing values of SoS
•Compensation is intrinsic to the system
•A good system must have a strong SoS for each LogR value
18
Dipartimento di Meccanica e Tecnologie Industriali
Quality assessment logics 2/3
DMTI procedure
•Step 1: test the system against a dataset (LogR)
•Step 2: calculate supportability (SoS)
•Step 3: calculate compensated scores (New LogR)
•Step 4: calculate the percentage P of new LogR which
has a “strong” SoS score (fixed by our standards)
•Step 5: evaluate the Degree of Supportability (DoS):
DoS = atanh (2P-1)
19
Dipartimento di Meccanica e Tecnologie Industriali
Quality assessment logics 3/3
Regardless of the specific procedure, our DoS score is equivalent to
a LogR score!
20
Dipartimento di Meccanica e Tecnologie Industriali
Conclusion
•
A general purpose tool has been developed to score system supportability
•
An additional mathematic tool has been developed to compensate
unbalanced systems
•
The tools are system independent and theoretically motivated rather than
empirically built
•
The tools are useful to reduce both false positives and false negatives
•
False score reduction produces a decrement in discriminative power
•
Such decrement is intrinsic to the system response and is univocally usable
for system quality assessment
•
The proposed procedure for system quality assessment (degree of
supportability) uses the well known Evett scale to score the system
supportability
21
Dipartimento di Meccanica e Tecnologie Industriali
Thank You for your attention…
Questions?
22
Scarica

Pres