A New SLT Decoder
based on Confusion Networks
Nicola Bertoldi and Marcello Federico
ITC-irst - Centro per la Ricerca Scientifica e Tecnologica
I-38050 Povo (Trento), Italy
{bertoldi,federico}@itc.it
Bertoldi
SLT based on Confusion networks
OpenLab 2006
1
Outline
Bertoldi
SLT based on Confusion networks
OpenLab 2006
2
Outline
• Spoken Language Translation
Bertoldi
SLT based on Confusion networks
OpenLab 2006
2
Outline
• Spoken Language Translation
• Approaches
Bertoldi
SLT based on Confusion networks
OpenLab 2006
2
Outline
• Spoken Language Translation
• Approaches
• Confusion Network (CN)
Bertoldi
SLT based on Confusion networks
OpenLab 2006
2
Outline
• Spoken Language Translation
• Approaches
• Confusion Network (CN)
• CN-based Translation Model
Bertoldi
SLT based on Confusion networks
OpenLab 2006
2
Outline
• Spoken Language Translation
• Approaches
• Confusion Network (CN)
• CN-based Translation Model
• CN-based Decoder
Bertoldi
SLT based on Confusion networks
OpenLab 2006
2
Outline
• Spoken Language Translation
• Approaches
• Confusion Network (CN)
• CN-based Translation Model
• CN-based Decoder
• Evaluation
Bertoldi
SLT based on Confusion networks
OpenLab 2006
2
Spoken Language Translation
Bertoldi
SLT based on Confusion networks
OpenLab 2006
3
Spoken Language Translation
• Translation of speech input
– spontaneous speech phenomena:
repetitions, hesitations
– recognition errors:
syntax, meaning
Bertoldi
SLT based on Confusion networks
OpenLab 2006
3
Spoken Language Translation
42.5
42
• Translation of speech input
repetitions, hesitations
– recognition errors:
syntax, meaning
BLEU SCORE
– spontaneous speech phenomena:
41.5
41
40.5
40
39.5
39
38.5
14
15
16
17
18
WER OF TRANSCRIPTIONS
19
20
21
• Automatic Speech Recognition and Machine Translation
– strong correlation between recognition and translation quality
– ASR WER decreases in a set of hypotheses
Bertoldi
SLT based on Confusion networks
OpenLab 2006
3
Spoken Language Translation
42.5
42
• Translation of speech input
repetitions, hesitations
– recognition errors:
syntax, meaning
BLEU SCORE
– spontaneous speech phenomena:
41.5
41
40.5
40
39.5
39
38.5
14
15
16
17
18
WER OF TRANSCRIPTIONS
19
20
21
• Automatic Speech Recognition and Machine Translation
– strong correlation between recognition and translation quality
– ASR WER decreases in a set of hypotheses
– idea: exploitation of more transcriptions
Bertoldi
SLT based on Confusion networks
OpenLab 2006
3
Statistical Spoken Language Translation
Bertoldi
SLT based on Confusion networks
OpenLab 2006
4
Statistical Spoken Language Translation
Given a speech input o in the source language, and the set F(o) of its possible transcriptions,
find the best translation through the following approximate criterion:
e∗ = arg max
Pr(e | o) ≈ arg max
max Pr(e, f | o)
e
e
f ∈F (o)
Bertoldi
SLT based on Confusion networks
OpenLab 2006
4
Statistical Spoken Language Translation
Given a speech input o in the source language, and the set F(o) of its possible transcriptions,
find the best translation through the following approximate criterion:
e∗ = arg max
Pr(e | o) ≈ arg max
max Pr(e, f | o)
e
e
f ∈F (o)
• Pr(e, f | o) speech translation model
– acoustic and translation features
Bertoldi
SLT based on Confusion networks
OpenLab 2006
4
Statistical Spoken Language Translation
Given a speech input o in the source language, and the set F(o) of its possible transcriptions,
find the best translation through the following approximate criterion:
e∗ = arg max
Pr(e | o) ≈ arg max
max Pr(e, f | o)
e
e
f ∈F (o)
0
@BG
@BG è
@BG
256
255
257
258
@BG
è
è
259
è
@BG
@BG
260
@BG
261
1
@BG
è
era
262
è
era
@BG
• Pr(e, f | o) speech translation model
è
è
è
@BG
era
@BG
è
3
era
264
è
2
era
263
@BG
era
4
@BG
@BG
265
era
5
@BG
era
266
era
@BG
era
6
@BG
@BG
267
era
7
era
@BG
@BG
268
era
era
8
vacanza
@BG
269
vacanza
@BG
era
vacanza
270
@BG
vacanza
10
@BG
vacanza
vacanza
era
@BG
271
vacanza
era
9
era
11
@BG
@BG
272
vacanza
12
@BG
@BG
273
– acoustic and translation features
13
@BG la
274
@BG
la
275
14
@BG
300
la
la
@BG
299
l’ @BG
@BG
277
@BG
la
l’
305
19
cancello
282
@BG
l’
cancello
@BG
@BG
@BG
287
25
@BG
di
imbarco
288
@BG
imbarco
314
d’
imbarco
d’
39
@BG
320
58
@BG
59
60
imbarco
imbarco
imbarco
imbarco
@BG
324
imbarco
imbarco
44
imbarco
imbarco
66
imbarco
46
imbarco
imbarco
@BG
327
73
imbarco
48
@BG
imbarco
@BG
75
imbarco
o
49
50
77
@BG
@BG
51
ho
@BG
ho
53
ho
– complex structure
91
hot
o
@BG
o
o
@BG
143
@BG
i
n’
245
247
n
d’ @BG
235
@BG
</s>
227
@BG
i
240
@BG
c
239
248
236
</s>
</s>
244
</s>
</s>
o
</s>
a
93
@BG
ha
ha
@BG
</s>
@BG
136
ad
@BG
è
121
192
@BG
da
@BG
a
82
79
@BG
191
@BG
@BG
up
@BG
95
uh
uh
105
@BG
@BG
</s>
</s>
106
@BG
138
</s>
</s>
176
183
a
@BG
@BG
@BG
104
213
@BG
@BG
182
214
@BG
@BG
207
185
@BG
184
ah
@BG
194
81
85
@BG
</s>
@BG
@BG
@BG
</s>
111
@BG
</s>
109
</s>
97
</s>
</s>
</s>
</s>
</s>
</s>
208
</s>
</s>
</s>
215
186
</s>
221
108
107
115
</s>
220
@BG
a @BG
206
ah
@BG
@BG
@BG
218
175
193
p
102
103
129
</s>
a
@BG
ai
à
110
@BG
123
un
un
a
147
à
@BG
96
</s>
</s>
177
a
@BG
119
</s>
201
80
a
u
u
@BG
114
là
219
la
@BG
212
@BG
94
145
113
là
@BG
@BG
211
@BG
@BG
84
@BG
112
@BG
@BG
@BG
è
209
@BG
210
la
@BG
174
e @BG
69
la
la
@BG
217
@BG
up
là
la
@BG
216
è
da
u @BG
137
l’ @BG
122
la
ma
ma
@BG
173
144
un’
124
ma
72
74
è
190
@BG
128
@BG
@BG
@BG
a
146
ad
120
@BG
è
200
in
89
l’
162
</s>
</s>
</s>
@BG
@BG
@BG
169
</s>
254
</s>
@BG
un
@BG
è
e
71
@BG
@BG @BG
161
−
@BG
all’
@BG
e
199
189
all’
70
198
@BG
e @BG
un’
83
87
154
@BG
233
ad
163
</s>
</s>
un
@BG
a
252
@BG
116
e
117
l’
a
78
−@BG
118
141
101
@BG
92
−
−
@BG
153
@BG
160
@BG
ha
151
@BG
152
@BG
@BG
@BG
88
@BG
a
195
@BG
i
196
253
@BG
b
è
un’
197
@BG
ha
e
188
un
159
al
149
un
il
140
158
@BG
@BG
230
</s>
@BG
a
@BG
100
di
229
e
250
@BG
si
99
224
@BG
@BG
i
234
di
@BG
251
−
150
è
a
180
è
181
è
o @BG
156
@BG
139
@BG
@BG
e
b
232
@BG
238
242
</s>
@BG
@BG
e
179
98
@BG
205
223
@BG
@BG
231
241
@BG
</s>
</s>
@BG
@BG
243
246
c
237
k
@BG
@BG
e
249
@BG
228
135
c
ha
157
@BG
203
@BG
i
222
i
225
@BG
178
@BG
204
@BG
127
@BG @BG
il
202
ai
168
@BG
126
@BG
226
al
ha
148
@BG
166
è
al
155
@BG
@BG
90
o e
167
@BG
125
@BG
171
172
@BG
@BG
</s>
142
@BG
170
131
133
@BG
134
@BG
57
oh
130
@BG
132
@BG
54
@BG
86
ho
o
165
ho
a
76
ho@BG
a
187
@BG
164
@BG
68
@BG
@BG
imbarco
@BG
@BG
67
imbarco
47
@BG
è
imbarco
imbarco
52
@BG
imbarco
@BG
55
65
imbarco
45
@BG
56
@BG
imbarco
imbarco
326
@BG
imbarco
64
imbarco
@BG
imbarco
imbarco
@BG
imbarco
@BG
@BG
imbarco
63
imbarco
43
</s>
imbarco
62
@BG
42
325
61
@BG
imbarco
@BG
@BG
</s>
di
@BG
imbarco
41
imbarco
298
323
@BG
di
32
imbarco
@BG
@BG
con
di
@BG
@BG
40
@BG
322
@BG
di
31
imbarco
297
con
con
di
@BG
d’
imbarco
@BG
296
@BG
321
30
d’
@BG
36
@BG
37
38
imbarco
295
con
di
@BG
294
@BG
@BG
29
@BG
d’
35
imbarco
293
@BG
bar
@BG
33
@BG
imbarco
@BG
@BG
bar
d’
34
@BG
291
292
bar
@BG
27
@BG
290
imbarco
315
@BG
316
di
28
@BG
imbarco
@BG
26
@BG
289
@BG
– huge amount of transcription hypotheses
@BG
imbarco
imbarco
313
con
24
imbarco
di
319
@BG
imbarco
@BG
cancello
cancello
cancello
286
@BG
cancello
22
23
@BG
311
318
cancello
21
cancello
@BG
285
di
@BG
317
cancello
@BG
283
@BG
284
308
@BG
312
• F(o) is an ASR word graph (WG):
20
@BG
306
@BG
@BG
310
bar
@BG
@BG
@BG di
bar
18
l’
281
l’
307
di
di
@BG
280
@BG
309
di
@BG
17
@BG
l’
la
16
@BG
279
l’
la
@BG
@BG
278
l’
l’
@BG
@BG
304
di
@BG
276
@BG
302
303
15
l’
301
la
la
la
</s>
</s>
328
Bertoldi
SLT based on Confusion networks
OpenLab 2006
4
Approaches
Bertoldi
SLT based on Confusion networks
OpenLab 2006
5
Approaches
• 1-best Decoder: a text MT system only translates the best transcription of the ASR. No use of
multiple transcriptions.
Bertoldi
SLT based on Confusion networks
OpenLab 2006
5
Approaches
• 1-best Decoder: a text MT system only translates the best transcription of the ASR. No use of
multiple transcriptions.
• N -best Decoder: N hypotheses are translated by a text MT decoder and reranked according to
ASR scores, e.g. of acoustic and source language models. It does not advantage from overlaps
among N -best.
Bertoldi
SLT based on Confusion networks
OpenLab 2006
5
Approaches
• 1-best Decoder: a text MT system only translates the best transcription of the ASR. No use of
multiple transcriptions.
• N -best Decoder: N hypotheses are translated by a text MT decoder and reranked according to
ASR scores, e.g. of acoustic and source language models. It does not advantage from overlaps
among N -best.
• Finite State Transducer: both ASR and MT models are merged into one finite-state network and a
transducer decodes the input speech signal in one shot. Difficult scaling up to large domains.
Bertoldi
SLT based on Confusion networks
OpenLab 2006
5
Approaches
• 1-best Decoder: a text MT system only translates the best transcription of the ASR. No use of
multiple transcriptions.
• N -best Decoder: N hypotheses are translated by a text MT decoder and reranked according to
ASR scores, e.g. of acoustic and source language models. It does not advantage from overlaps
among N -best.
• Finite State Transducer: both ASR and MT models are merged into one finite-state network and a
transducer decodes the input speech signal in one shot. Difficult scaling up to large domains.
• Confusion Network Decoder: an approximate WG is extracted from the ASR output and is directly
translated. It exploits overlaps among hypotheses.
Bertoldi
SLT based on Confusion networks
OpenLab 2006
5
SLT system
1−best
"hola mundo"
ASR
Word Graph
N−best
MT
"hello world"
CN
speech
signal
Bertoldi
speech
decoding
transcription
hypotheses
SLT based on Confusion networks
filter
MT
input
OpenLab 2006
MT
decoding
best
translation
6
Confusion Network
Bertoldi
SLT based on Confusion networks
OpenLab 2006
7
Confusion Network
0
@BG
@BG è
@BG
256
257
258
@BG
è
è
259
è
@BG
0
@BG
260
@BG
261
1
@BG
è
è
era
@BG
è
è
è
@BG
era
@BG
1
3
era
264
è
2
era
263
è
<s>
era
262
@BG
era
4
@BG
@BG
265
era
5
@BG
era
266
era è eps
era
@BG
era
6
@BG
@BG
267
era
7
@BG
era
era
8
@BG
269
vacanza
@BG
era
vacanza
270
@BG
10
vacanza
vacanza
era
11
@BG
@BG
272
vacanza
@BG
13
274
@BG
299
la
@BG
300
la
@BG
277
@BG
la
la
l’
@BG
di
imbarco
@BG
imbarco
@BG
26
@BG
d’
289
imbarco
imbarco
@BG
291
imbarco
imbarco
d’
@BG
39
@BG
con
60
43
imbarco
@BG
imbarco
imbarco
imbarco
imbarco
75
imbarco
o
49
@BG
ho
ho
53
ho
@BG
@BG
o
55
@BG
134
245
247
235
@BG
</s>
d’ @BG
i
240
@BG
127
c
239
243
248
236
</s>
</s>
241
@BG
</s>
244
</s>
@BG
238
o
234
di
181
b
è
si
161
−
116
@BG
153
−
252
@BG
</s>
@BG
l’
162
@BG
ad
@BG
è
@BG
121
@BG
118
u @BG
137
@BG
@BG
145
a
u
u
95
@BG
122
uh
uh
105
@BG
@BG
114
à
96
106
@BG
@BG
@BG
@BG
ah
@BG
104
@BG
182
@BG
@BG
207
eps
185
@BG
194
e
o
a i
ho ha c bl’ k s t
184
14
81
107
85
@BG
</s>
@BG
</s>
</s>
@BG
</s>
</s>
è
108
103
109
97
13
221
214
ah
@BG
eps ci
@BG
213
@BG
a
111
</s>
</s>
220
@BG
176
183
@BG
@BG
</s>
</s>
la
218
a @BG
115
</s>
119
</s>
p
102
110
@BG
</s>
</s>
a
175
206
@BG
ai
à
12
@BG
@BG
un
un
a
147
129
</s>
a
@BG
138
177
193
@BG
@BG
@BG
123
@BG
201
80
@BG
là
219
212
@BG
@BG
84
94
113
@BG
up
@BG
là
@BG
211
@BG
174
è
79
@BG
209
@BG
217
@BG
e @BG
82
69
@BG
210
la
eps f
là
la
la
la
@BG
è
up
da
a
112
l’ @BG
la
ma
@BG
216
173
da
@BG
191
128
@BG
ma
72
ma
@BG
@BG
190
@BG
144
un’
124
è
@BG
74
a
146
192
ad
120
163
</s>
</s>
@BG
un
@BG
</s>
233
68
e
71
@BG
è
200
in
89
@BG @BG
136
è
@BG
199
e @BG
all’
all’
@BG
70
e
198
@BG
189
un’
83
</s>
254
</s>
ad
@BG
@BG
a
a
78
un
87
154
169
</s>
</s>
e
117
l’
141
101
@BG
@BG
−
152
197
@BG
ha
151
@BG
−@BG
160
92
@BG
@BG
@BG
229
è
ha
@BG
@BG
253
@BG
ha
88
a
93
e
188
un
159
al
@BG
@BG
ha
@BG
b
250
</s>
@BG
@BG
@BG
e
a
@BG
i
158
100
di
un’
195
196
99
@BG
251
o @BG
149
un
il
140
a
@BG
230
</s>
@BG
a
224
@BG
@BG
i
è
180
è
−
ha
148
@BG
156
@BG
150
@BG
@BG
e
232
@BG
e
98
@BG
@BG
139
179
205
e
ha
157
@BG
249
@BG
242
</s>
c
</s>
@BG
@BG
@BG
223
@BG
@BG
231
@BG
203
222
i
@BG
</s>
@BG
237
k
@BG
246
c
@BG
204
i
225
i
227
@BG
ai
@BG
@BG @BG
226
@BG
228
n
il
178
168
126
@BG
172
@BG
n’
@BG
143
al
al
155
@BG
@BG
166
è
202
125
@BG
@BG
</s>
@BG
o
165
ho
90
o e
167
171
@BG
135
142
170
131
133
@BG
56
57
oh
o
o
@BG
54
@BG
hot
a
76
ho@BG
77
@BG
86
ho
130
132
@BG
</s>
91
a
187
@BG
164
@BG
50
@BG
51
@BG
t e eps i un
@BG
67
@BG
@BG
73
@BG
imbarco
@BG
a
11
è
imbarco
imbarco
48
52
10
b
imbarco
imbarco
@BG
upahall’ da−a. ai à art ha a.a. aa ora ha−ha a.c. abc qua anch’
@BG
imbarco
@BG
47
là
66
imbarco
46
il
65
imbarco
@BG
imbarco
imbarco
ma
@BG
imbarco
45
@BG
327
l’ un’
64
imbarco
imbarco
326
ad
imbarco
imbarco
imbarco
@BG
ho la
imbarco
@BG
44
al
imbarco
63
imbarco
@BG
a
imbarco
62
@BG
324
@BG
61
@BG
@BG
325
eps
imbarco
imbarco
imbarco
imbarco
@BG
</s>
9
@BG
298
@BG
8
eps ha
di
59
imbarco
323
con
ohincsiheopnattoco’lo hot per se on cc ck rock w of eh hop ed del p. s g huh su l’ non y f air
@BG
@BG
42
@BG
di d’
58
imbarco
41
imbarco
u k uh n n’
@BG
imbarco
@BG
@BG
322
c
di
32
imbarco
40
@BG
297
con
con
conva
di
@BG
imbarco
@BG
296
@BG
o
di
31
d’
imbarco
295
@BG
320
@BG
d’
@BG
37
38
@BG
319
e
di
@BG
36
è
di
30
@BG
294
321
29
@BG
d’
35
@BG
292
293
@BG
bar
@BG
33
@BG
bar
d’
eps
@BG
27
@BG
34
@BG
315
@BG
@BG
di
28
@BG
290
bar
con
25
288
imbarco
imbarco
imbarco
7
@BG
287
@BG
di
316
bar
24
imbarco
@BG
eps−
@BG
286
imbarco
@BG
cancello
cancello
cancello
di
314
6
cancello
@BG
@BG
318
cancello
21
22
23
@BG
317
cancello
@BG
313
bar
20
cancello
cancello
311
312
imbarco bar eps
cancello
l’
@BG
310
(Mangu 1999)
19
@BG
@BG
283
@BG
@BG
285
@BG di
di
@BG
284
308
di
18
l’
282
@BG
306
@BG
di
5
@BG
281
l’
305
@BG
307
di
di
@BG
17
280
@BG
309
d’ l’ all’ eps
16
@BG
279
@BG
l’
@BG
@BG
278
l’
l’
l’
@BG
a WG by shrinking into a unifilar WG
@BG
276
la
304
15
l’
301
@BG
303
la
4
14
l’ @BG
275
la
@BG
302
@BG
eps la
12
@BG
273
@BG la
la
3
@BG
271
vacanza
cancello vacanza eps
era
9
era
vacanza
@BG
la
2
era
@BG
268
vacanza
• A Confusion Network (CN) approximates
Confusion Network
Word Graph
255
</s>
</s>
15
</s>
</s>
</s>
@BG
208
</s>
</s>
</s>
</s>
215
186
</s>
</s>
</s>
16
328
Bertoldi
SLT based on Confusion networks
OpenLab 2006
7
Confusion Network
0
@BG
@BG è
@BG
256
257
258
@BG
è
è
259
è
@BG
0
@BG
260
@BG
261
1
@BG
è
è
era
@BG
è
è
è
@BG
era
@BG
1
3
era
264
è
2
era
263
è
<s>
era
262
@BG
era
4
@BG
@BG
265
era
5
@BG
era
266
era è eps
era
@BG
era
6
@BG
@BG
267
era
7
@BG
era
era
8
@BG
269
vacanza
@BG
era
vacanza
270
@BG
10
vacanza
vacanza
era
11
@BG
@BG
272
vacanza
@BG
13
274
@BG
299
la
@BG
300
la
@BG
277
@BG
la
la
l’
di
imbarco
@BG
imbarco
@BG
26
@BG
d’
289
imbarco
imbarco
@BG
291
imbarco
imbarco
d’
@BG
39
@BG
con
60
43
imbarco
@BG
imbarco
imbarco
imbarco
imbarco
75
imbarco
o
49
@BG
ho
ho
53
ho
@BG
@BG
o
55
@BG
134
245
247
235
@BG
</s>
d’ @BG
i
240
@BG
127
c
239
243
248
236
</s>
</s>
241
@BG
</s>
244
</s>
@BG
181
b
238
o
è
si
@BG
@BG
è
161
−
116
@BG
197
153
−
252
233
@BG
</s>
@BG
l’
162
@BG
ad
@BG
è
@BG
121
@BG
118
u @BG
137
@BG
@BG
145
a
u
u
95
@BG
122
uh
uh
105
@BG
@BG
114
147
à
96
106
@BG
@BG
110
@BG
104
13
221
214
@BG
@BG
207
eps
185
@BG
194
103
107
@BG
o
a i
ho ha c bl’ k s t
</s>
@BG
</s>
</s>
@BG
</s>
</s>
e
14
81
85
109
97
è
184
ah
@BG
111
</s>
</s>
ah
@BG
182
@BG
@BG
</s>
</s>
</s>
@BG
@BG
eps ci
@BG
213
@BG
a
115
</s>
119
220
@BG
176
183
108
@BG
</s>
</s>
129
</s>
p
102
la
218
a @BG
206
@BG
ai
à
a
175
un
a
12
@BG
@BG
un
a
@BG
138
177
193
@BG
@BG
@BG
123
@BG
201
80
@BG
là
219
212
@BG
@BG
84
94
113
@BG
up
@BG
là
@BG
211
@BG
174
è
79
@BG
209
@BG
217
@BG
e @BG
82
69
@BG
210
la
eps f
là
la
la
la
@BG
è
up
da
a
112
l’ @BG
la
ma
@BG
216
173
da
@BG
191
128
@BG
ma
72
ma
@BG
@BG
190
@BG
144
un’
124
è
@BG
74
a
146
192
ad
120
163
</s>
</s>
@BG
un
@BG
254
</s>
68
e
71
@BG
è
200
in
89
@BG @BG
136
è
@BG
199
e @BG
all’
all’
@BG
70
e
198
@BG
189
un’
83
</s>
</s>
230
ad
@BG
@BG
169
a
a
78
un
87
154
@BG
</s>
</s>
e
117
l’
141
101
@BG
@BG
−
@BG
@BG
@BG
ha
151
@BG
−@BG
160
92
152
253
@BG
250
</s>
ha
@BG
@BG
b
ha
88
a
93
e
188
un
159
al
@BG
@BG
ha
@BG
229
e
a
@BG
i
196
158
100
di
un’
195
a
99
@BG
251
o @BG
149
un
il
140
@BG
@BG
234
</s>
@BG
a
224
@BG
@BG
i
è
180
di
−
ha
148
@BG
156
@BG
150
@BG
@BG
è
249
@BG
139
e
98
e
232
@BG
@BG
205
e
ha
157
179
@BG
@BG
242
</s>
c
</s>
@BG
@BG
@BG
223
@BG
@BG
231
@BG
203
222
i
@BG
</s>
@BG
237
k
@BG
246
c
@BG
204
i
225
i
227
@BG
ai
@BG
@BG @BG
226
@BG
228
n
il
178
168
126
@BG
172
@BG
n’
@BG
143
al
al
155
@BG
@BG
166
è
202
125
@BG
@BG
</s>
@BG
o
165
ho
90
o e
167
171
@BG
135
142
170
131
133
@BG
56
57
oh
o
o
@BG
54
@BG
hot
a
76
ho@BG
77
@BG
86
ho
130
132
@BG
</s>
91
a
187
@BG
164
@BG
50
@BG
51
@BG
t e eps i un
@BG
67
@BG
@BG
73
@BG
imbarco
@BG
a
11
è
imbarco
imbarco
48
52
10
b
imbarco
imbarco
@BG
upahall’ da−a. ai à art ha a.a. aa ora ha−ha a.c. abc qua anch’
@BG
imbarco
@BG
47
là
66
imbarco
46
il
65
imbarco
@BG
imbarco
imbarco
ma
@BG
imbarco
45
@BG
327
l’ un’
64
imbarco
imbarco
326
ad
imbarco
imbarco
imbarco
@BG
ho la
imbarco
@BG
44
al
imbarco
63
imbarco
@BG
a
imbarco
62
@BG
324
@BG
61
@BG
@BG
325
eps
imbarco
imbarco
imbarco
imbarco
@BG
</s>
9
@BG
298
@BG
8
eps ha
di
59
imbarco
323
con
ohincsiheopnattoco’lo hot per se on cc ck rock w of eh hop ed del p. s g huh su l’ non y f air
@BG
@BG
42
@BG
di d’
58
imbarco
41
imbarco
u k uh n n’
@BG
imbarco
@BG
@BG
322
c
di
32
imbarco
40
@BG
297
con
con
conva
di
@BG
imbarco
@BG
296
@BG
o
di
31
d’
imbarco
295
@BG
320
@BG
d’
@BG
37
38
@BG
319
e
di
@BG
36
è
di
30
@BG
294
321
29
@BG
d’
35
@BG
292
293
@BG
bar
@BG
33
@BG
bar
d’
eps
@BG
27
@BG
34
@BG
315
@BG
bar
di
28
@BG
290
316
con
25
288
imbarco
imbarco
imbarco
7
@BG
287
@BG
di
@BG
eps−
24
imbarco
@BG
cancello
@BG
286
imbarco
@BG
6
cancello
cancello
cancello
di
314
bar
• Representation through a compact table
@BG
@BG
@BG
318
cancello
21
22
23
@BG
317
cancello
@BG
313
bar
20
cancello
cancello
311
312
imbarco bar eps
cancello
l’
@BG
310
(Mangu 1999)
19
@BG
@BG
283
@BG
@BG
285
@BG di
di
@BG
284
308
di
18
l’
282
@BG
306
@BG
di
5
@BG
281
l’
305
@BG
307
di
di
@BG
17
280
@BG
309
d’ l’ all’ eps
16
@BG
279
@BG
l’
@BG
@BG
278
l’
l’
l’
@BG
a WG by shrinking into a unifilar WG
@BG
276
la
304
15
l’
301
@BG
303
la
4
14
l’ @BG
275
la
@BG
302
@BG
eps la
12
@BG
273
@BG la
la
3
@BG
271
vacanza
cancello vacanza eps
era
9
era
vacanza
@BG
la
2
era
@BG
268
vacanza
• A Confusion Network (CN) approximates
Confusion Network
Word Graph
255
</s>
</s>
15
</s>
</s>
</s>
@BG
208
</s>
</s>
</s>
</s>
215
186
</s>
16
</s>
</s>
328
era 0.997
cancello 0.995
0.999
di 0.615
imbarco 0.999
è 0.002
vacanza 0.004
la 0.001
d’ 0.376
bar 0.001
0.001
0.002
...
l’ 0.002
...
0.001
Bertoldi
SLT based on Confusion networks
OpenLab 2006
7
Confusion Network
0
@BG
@BG è
@BG
256
257
258
@BG
è
è
259
è
@BG
0
@BG
260
@BG
261
1
@BG
è
è
era
@BG
è
è
è
@BG
era
@BG
1
3
era
264
è
2
era
263
è
<s>
era
262
@BG
era
4
@BG
@BG
265
era
5
@BG
era
266
era è eps
era
@BG
era
6
@BG
@BG
267
era
7
@BG
era
era
8
@BG
269
vacanza
@BG
era
vacanza
270
@BG
10
vacanza
vacanza
era
11
@BG
@BG
272
vacanza
@BG
13
274
@BG
299
la
@BG
300
la
@BG
277
@BG
la
la
l’
di
imbarco
@BG
imbarco
@BG
26
@BG
d’
289
imbarco
imbarco
@BG
291
imbarco
imbarco
d’
@BG
39
@BG
con
60
43
imbarco
@BG
imbarco
imbarco
imbarco
imbarco
75
imbarco
o
49
@BG
• Each path corresponds to a hypothesis
ho
ho
53
ho
@BG
@BG
55
@BG
134
245
247
235
@BG
</s>
236
</s>
</s>
c
127
239
244
</s>
@BG
181
b
238
o
è
si
@BG
@BG
è
161
−
116
@BG
197
153
−
252
233
@BG
</s>
@BG
l’
162
@BG
ad
@BG
è
@BG
121
@BG
118
u @BG
137
@BG
@BG
145
a
u
u
95
@BG
122
uh
uh
105
@BG
@BG
114
147
à
96
106
@BG
@BG
110
@BG
104
13
221
214
@BG
@BG
207
eps
185
@BG
194
103
107
@BG
o
a i
ho ha c bl’ k s t
</s>
@BG
</s>
</s>
@BG
</s>
</s>
e
14
81
85
109
97
è
184
ah
@BG
111
</s>
</s>
ah
@BG
182
@BG
@BG
</s>
</s>
</s>
@BG
@BG
eps ci
@BG
213
@BG
a
115
</s>
119
220
@BG
176
183
108
@BG
</s>
</s>
129
</s>
p
102
la
218
a @BG
206
@BG
ai
à
a
175
un
a
12
@BG
@BG
un
a
@BG
138
177
193
@BG
@BG
@BG
123
@BG
201
80
@BG
là
219
212
@BG
@BG
84
94
113
@BG
up
@BG
là
@BG
211
@BG
174
è
79
@BG
209
@BG
217
@BG
e @BG
82
69
@BG
210
la
eps f
là
la
la
la
@BG
è
up
da
a
112
l’ @BG
la
ma
@BG
216
173
da
@BG
191
128
@BG
ma
72
ma
@BG
@BG
190
@BG
144
un’
124
è
@BG
74
a
146
192
ad
120
163
</s>
</s>
@BG
un
@BG
254
</s>
68
e
71
@BG
è
200
in
89
@BG @BG
136
è
@BG
199
e @BG
all’
all’
@BG
70
e
198
@BG
189
un’
83
</s>
</s>
230
ad
@BG
@BG
169
a
a
78
un
87
154
@BG
</s>
</s>
e
117
l’
141
101
@BG
@BG
−
@BG
@BG
@BG
ha
151
@BG
−@BG
160
92
152
253
@BG
250
</s>
ha
@BG
@BG
b
ha
88
a
93
e
188
un
159
al
@BG
@BG
ha
@BG
229
e
a
@BG
i
196
158
100
di
un’
195
a
99
@BG
251
o @BG
149
un
il
140
@BG
@BG
234
</s>
@BG
a
224
@BG
@BG
i
è
180
di
−
ha
148
@BG
156
@BG
150
@BG
@BG
è
249
@BG
139
e
98
e
232
@BG
@BG
205
e
ha
157
179
@BG
@BG
242
</s>
c
</s>
@BG
@BG
@BG
223
@BG
@BG
231
@BG
203
222
i
@BG
</s>
@BG
241
@BG
</s>
c
237
k
243
248
• Posterior probs for single words
i
@BG
204
i
225
i
227
@BG
240
@BG
@BG
246
• Possible insertion of words
d’ @BG
ai
@BG
@BG @BG
226
@BG
n
il
178
168
126
@BG
172
@BG
n’
@BG
143
al
al
155
@BG
@BG
166
è
202
125
@BG
@BG
</s>
@BG
o
165
ho
90
o e
167
171
@BG
135
142
170
131
133
228
• CN contains more paths than WG
o
@BG
56
57
oh
o
o
@BG
54
@BG
hot
a
76
ho@BG
77
@BG
86
ho
130
132
@BG
</s>
91
a
187
@BG
164
@BG
50
@BG
51
@BG
t e eps i un
@BG
67
@BG
@BG
73
@BG
imbarco
@BG
a
11
è
imbarco
imbarco
48
52
10
b
imbarco
imbarco
@BG
upahall’ da−a. ai à art ha a.a. aa ora ha−ha a.c. abc qua anch’
@BG
imbarco
@BG
47
là
66
imbarco
46
il
65
imbarco
@BG
imbarco
imbarco
ma
@BG
imbarco
45
@BG
327
l’ un’
64
imbarco
imbarco
326
ad
imbarco
imbarco
imbarco
@BG
ho la
imbarco
@BG
44
al
imbarco
63
imbarco
@BG
a
imbarco
62
@BG
324
@BG
61
@BG
@BG
325
eps
imbarco
imbarco
imbarco
imbarco
@BG
</s>
9
@BG
298
@BG
8
eps ha
di
59
imbarco
323
con
ohincsiheopnattoco’lo hot per se on cc ck rock w of eh hop ed del p. s g huh su l’ non y f air
@BG
@BG
42
@BG
di d’
58
imbarco
41
imbarco
u k uh n n’
@BG
imbarco
@BG
@BG
322
c
di
32
imbarco
40
@BG
297
con
con
conva
di
@BG
imbarco
@BG
296
@BG
o
di
31
d’
imbarco
295
@BG
320
@BG
d’
@BG
37
38
@BG
319
e
di
@BG
36
è
di
30
@BG
294
321
29
@BG
d’
35
@BG
292
293
@BG
bar
@BG
33
@BG
bar
d’
eps
@BG
27
@BG
34
@BG
315
@BG
bar
di
28
@BG
290
316
con
25
288
imbarco
imbarco
imbarco
7
@BG
287
@BG
di
@BG
eps−
24
imbarco
@BG
cancello
@BG
286
imbarco
@BG
6
cancello
cancello
cancello
di
314
bar
• Representation through a compact table
@BG
@BG
@BG
318
cancello
21
22
23
@BG
317
cancello
@BG
313
bar
20
cancello
cancello
311
312
imbarco bar eps
cancello
l’
@BG
310
(Mangu 1999)
19
@BG
@BG
283
@BG
@BG
285
@BG di
di
@BG
284
308
di
18
l’
282
@BG
306
@BG
di
5
@BG
281
l’
305
@BG
307
di
di
@BG
17
280
@BG
309
d’ l’ all’ eps
16
@BG
279
@BG
l’
@BG
@BG
278
l’
l’
l’
@BG
a WG by shrinking into a unifilar WG
@BG
276
la
304
15
l’
301
@BG
303
la
4
14
l’ @BG
275
la
@BG
302
@BG
eps la
12
@BG
273
@BG la
la
3
@BG
271
vacanza
cancello vacanza eps
era
9
era
vacanza
@BG
la
2
era
@BG
268
vacanza
• A Confusion Network (CN) approximates
Confusion Network
Word Graph
255
</s>
</s>
15
</s>
</s>
</s>
@BG
208
</s>
</s>
</s>
</s>
215
186
</s>
16
</s>
</s>
328
era 0.997
cancello 0.995
0.999
di 0.615
imbarco 0.999
è 0.002
vacanza 0.004
la 0.001
d’ 0.376
bar 0.001
0.001
0.002
...
l’ 0.002
...
• Likelihood for each hypothesis
Bertoldi
SLT based on Confusion networks
0.001
OpenLab 2006
7
Phrase-based Translation Model
Bertoldi
SLT based on Confusion networks
OpenLab 2006
8
Phrase-based Translation Model
• Phrase: sequence of consecutive words
Bertoldi
SLT based on Confusion networks
OpenLab 2006
8
Phrase-based Translation Model
• Phrase: sequence of consecutive words
• Alignment: map between CN and target phrases
one word per column aligned with a target phrase
e6
e5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
e4
e2 # e3
e1
.
NULL
.
.
.
f 11 f 21 f 31 f 41 f 51 f 61 f 71
f 12 f 22 f 32 f 42
f 62 f 72
f 23
f 43
f 63
f 24
Bertoldi
SLT based on Confusion networks
OpenLab 2006
f 64
8
Phrase-based Translation Model
• Phrase: sequence of consecutive words
• Alignment: map between CN and target phrases
one word per column aligned with a target phrase
e6
e5
ẽ∗ ≈ arg max max Pr(ẽ, a | G)
ẽ
a∈A(G,ẽ)
.
.
.
.
.
.
.
.
.
e1
.
.
NULL
.
.
SLT based on Confusion networks
.
.
.
.
.
.
.
.
.
.
.
.
.
f 11 f 21 f 31 f 41 f 51 f 61 f 71
f 12 f 22 f 32 f 42
f 62 f 72
f 23
f 43
f 63
f 24
Bertoldi
.
e4
e2 # e3
• Search criterion:
.
OpenLab 2006
f 64
8
Phrase-based Translation Model
• Phrase: sequence of consecutive words
• Alignment: map between CN and target phrases
one word per column aligned with a target phrase
e6
e5
ẽ∗ ≈ arg max max Pr(ẽ, a | G)
ẽ
a∈A(G,ẽ)
.
.
.
.
.
.
.
.
.
e1
.
.
NULL
.
SLT based on Confusion networks
.
.
.
.
.
.
.
.
.
.
.
.
.
.
f 11 f 21 f 31 f 41 f 51 f 61 f 71
f 12 f 22 f 32 f 42
f 62 f 72
f 23
f 43
f 63
f 24
• Pr(ẽ, a | G) is a log-linear phrase-based model
Bertoldi
.
e4
e2 # e3
• Search criterion:
.
OpenLab 2006
f 64
8
Log-Linear Phrase-based Translation Model
Bertoldi
SLT based on Confusion networks
OpenLab 2006
9
Log-Linear Phrase-based Translation Model
The conditional distribution Pr(ẽ, a
| G) is determined through suitable real valued feature functions
hr (ẽ, a | G), r = 1 . . . R, and takes the parametric form:

R
X
pλ (ẽ, a | G) ∝ exp 
Bertoldi
SLT based on Confusion networks
r=1


λr hr (ẽ, a | G)
OpenLab 2006
(1)
9
Log-Linear Phrase-based Translation Model
The conditional distribution Pr(ẽ, a
| G) is determined through suitable real valued feature functions
hr (ẽ, a | G), r = 1 . . . R, and takes the parametric form:

R
X
Feature Functions:
Bertoldi
pλ (ẽ, a | G) ∝ exp 
SLT based on Confusion networks
r=1


λr hr (ẽ, a | G)
OpenLab 2006
(1)
9
Log-Linear Phrase-based Translation Model
The conditional distribution Pr(ẽ, a
| G) is determined through suitable real valued feature functions
hr (ẽ, a | G), r = 1 . . . R, and takes the parametric form:

R
X
Feature Functions:
pλ (ẽ, a | G) ∝ exp 
r=1


(1)
λr hr (ẽ, a | G)
• Language model: 3-gram LM
e6
e5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
e4
e2 # e3
e1
.
NULL
.
.
.
f 11 f 21 f 31 f 41 f 51 f 61 f 71
f 12 f 22 f 32 f 42
f 62 f 72
f 23
f 43
f 63
f 24
Bertoldi
SLT based on Confusion networks
OpenLab 2006
f 64
9
Log-Linear Phrase-based Translation Model
The conditional distribution Pr(ẽ, a
| G) is determined through suitable real valued feature functions
hr (ẽ, a | G), r = 1 . . . R, and takes the parametric form:

R
X
Feature Functions:
pλ (ẽ, a | G) ∝ exp 
r=1


(1)
λr hr (ẽ, a | G)
• Language model: 3-gram LM
e6
e5
• Fertility models: for target phrases and NULL word
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
e4
e2 # e3
e1
.
NULL
.
.
.
f 11 f 21 f 31 f 41 f 51 f 61 f 71
f 12 f 22 f 32 f 42
f 62 f 72
f 23
f 43
f 63
f 24
Bertoldi
SLT based on Confusion networks
OpenLab 2006
f 64
9
Log-Linear Phrase-based Translation Model
The conditional distribution Pr(ẽ, a
| G) is determined through suitable real valued feature functions
hr (ẽ, a | G), r = 1 . . . R, and takes the parametric form:

R
X
Feature Functions:
pλ (ẽ, a | G) ∝ exp 
r=1


(1)
λr hr (ẽ, a | G)
• Language model: 3-gram LM
e6
e5
• Fertility models: for target phrases and NULL word
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
e4
• Distortion models: reordering of phrases and NULL word
e2 # e3
e1
.
NULL
.
.
.
f 11 f 21 f 31 f 41 f 51 f 61 f 71
f 12 f 22 f 32 f 42
f 62 f 72
f 23
f 43
f 63
f 24
Bertoldi
SLT based on Confusion networks
OpenLab 2006
f 64
9
Log-Linear Phrase-based Translation Model
The conditional distribution Pr(ẽ, a
| G) is determined through suitable real valued feature functions
hr (ẽ, a | G), r = 1 . . . R, and takes the parametric form:

R
X
Feature Functions:
pλ (ẽ, a | G) ∝ exp 
r=1


(1)
λr hr (ẽ, a | G)
• Language model: 3-gram LM
e6
e5
• Fertility models: for target phrases and NULL word
.
.
.
.
.
.
.
.
.
.
.
• Lexicon models: phrase-based
e2 # e3
e1
.
NULL
.
.
.
.
.
.
.
.
.
.
.
.
.
.
f 11 f 21 f 31 f 41 f 51 f 61 f 71
f 12 f 22 f 32 f 42
f 62 f 72
f 23
f 43
f 63
f 24
SLT based on Confusion networks
.
e4
• Distortion models: reordering of phrases and NULL word
Bertoldi
.
OpenLab 2006
f 64
9
Log-Linear Phrase-based Translation Model
The conditional distribution Pr(ẽ, a
| G) is determined through suitable real valued feature functions
hr (ẽ, a | G), r = 1 . . . R, and takes the parametric form:

R
X
Feature Functions:
pλ (ẽ, a | G) ∝ exp 
r=1


• Language model: 3-gram LM
e6
e5
• Fertility models: for target phrases and NULL word
• Lexicon models: phrase-based
.
.
.
.
.
.
.
.
.
.
.
.
.
.
e2 # e3
e1
.
.
.
.
.
.
.
NULL
.
.
.
.
.
.
f 11 f 21 f 31 f 41 f 51 f 61 f 71
f 12 f 22 f 32 f 42
f 62 f 72
f 23
f 43
f 63
• Likelihood of the path within G
SLT based on Confusion networks
.
e4
• Distortion models: reordering of phrases and NULL word
Bertoldi
(1)
λr hr (ẽ, a | G)
f 24
OpenLab 2006
f 64
9
Log-Linear Phrase-based Translation Model
The conditional distribution Pr(ẽ, a
| G) is determined through suitable real valued feature functions
hr (ẽ, a | G), r = 1 . . . R, and takes the parametric form:

R
X
Feature Functions:
pλ (ẽ, a | G) ∝ exp 
r=1


(1)
λr hr (ẽ, a | G)
• Language model: 3-gram LM
e6
e5
• Fertility models: for target phrases and NULL word
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
e4
• Distortion models: reordering of phrases and NULL word
• Lexicon models: phrase-based
e2 # e3
e1
.
NULL
.
.
.
f 11 f 21 f 31 f 41 f 51 f 61 f 71
f 12 f 22 f 32 f 42
f 62 f 72
f 23
f 43
f 63
• Likelihood of the path within G
f 24
f 64
• True length of the path disregarding -words
Bertoldi
SLT based on Confusion networks
OpenLab 2006
9
Process for generating a translation hypothesis
Translation
Score:
Bertoldi
SLT based on Confusion networks
f 41
f 51
f 61
f 22
f 42
ε
f 62
ε
f 43
f 11
f 21
f 12
f 31
OpenLab 2006
source
CN
f 63
10
Process for generating a translation hypothesis
1
choose how many columns
2
choose which consecutive columns
3
choose a word for each column
~
f1
Translation
Score:
Bertoldi
SLT based on Confusion networks
source
phrases
f 41
f 51
f 61
f 22
f 42
ε
f 62
ε
f 43
f 11
f 21
f 12
f 31
OpenLab 2006
source
CN
f 63
11
Process for generating a translation hypothesis
1
choose how many columns
2
choose which consecutive columns
3
choose a word for each column
4
choose a target phrase
Translation
Score:
Bertoldi
SLT based on Confusion networks
~
e1
target
phrases
~
f1
source
phrases
f 41
f 51
f 61
f 22
f 42
ε
f 62
ε
f 43
f 11
f 21
f 12
f 31
OpenLab 2006
source
CN
f 63
12
Process for generating a translation hypothesis
target
words
e1
1
choose how many columns
2
choose which consecutive columns
3
choose a word for each column
4
choose a target phrase
5
split target phrase into words
Translation e1
Score:
Bertoldi
SLT based on Confusion networks
~
e1
target
phrases
~
f1
source
phrases
f 41
f 51
f 61
f 22
f 42
ε
f 62
ε
f 43
f 11
f 21
f 12
f 31
OpenLab 2006
source
CN
f 63
13
Process for generating a translation hypothesis
target
words
e1
1
choose how many columns
2
choose which consecutive columns
3
choose a word for each column
4
choose a target phrase
5
split target phrase into words
6
compute score
Translation e1
Score:
Bertoldi
s1
SLT based on Confusion networks
~
e1
target
phrases
~
f1
source
phrases
f 41
f 51
f 61
f 22
f 42
ε
f 62
ε
f 43
f 11
f 21
f 12
f 31
OpenLab 2006
source
CN
f 63
14
Process for generating a translation hypothesis
e1
1
choose how many columns
2
choose which consecutive columns
3
choose a word for each column
4
choose a target phrase
5
split target phrase into words
6
Translation e1 e2 e 3 e4
Score:
Bertoldi
~
e1
s1 + s2
SLT based on Confusion networks
target
phrases
~
e2
~
f1
compute score
target
words
e2 # e3 # e4
~
f2
source
phrases
f 41
f 51
f 61
f 22
f 42
ε
f 62
ε
f 43
f 11
f 21
f 12
f 31
OpenLab 2006
source
CN
f 63
15
Process for generating a translation hypothesis
e1
1
choose how many columns
2
choose which consecutive columns
3
choose a word for each column
4
choose a target phrase
5
split target phrase into words
6
Score:
Bertoldi
~
e1
s1 + s2 + s3
SLT based on Confusion networks
target
words
e5
~
e2
~
f1
compute score
Translation e1 e2 e 3 e4 e5
e2 # e3 # e4
target
phrases
~
e3
~
f2
source
phrases
f 41
f 51
f 61
f 22
f 42
ε
f 62
ε
f 43
f 11
f 21
f 12
f 31
OpenLab 2006
source
CN
f 63
16
Process for generating a translation hypothesis
e1
1
choose how many columns
2
choose which consecutive columns
3
choose a word for each column
4
choose a target phrase
5
split target phrase into words
6
Score:
Bertoldi
~
e1
s1 + s2 + s3 + s4
SLT based on Confusion networks
e5
~
e2
~
f1
compute score
Translation e1 e2 e 3 e4 e5 e 6 e 7
e2 # e3 # e4
e6 # e7
target
words
~
e4
target
phrases
~
e3
~
f2
~
f3
source
phrases
f 41
f 51
f 61
f 22
f 42
ε
f 62
ε
f 43
f 11
f 21
f 12
f 31
OpenLab 2006
source
CN
f 63
17
Process for generating a translation hypothesis
e1
1
choose how many columns
2
choose which consecutive columns
3
choose a word for each column
4
choose a target phrase
5
split target phrase into words
~
e1
NULL
5a not translate words
6 compute score
Score:
e5
~
e2
~
f1
e6 # e7
target
words
~
e4
target
phrases
~
e3
~
f2
~
f3
source
phrases
s1 + s2 + s3 + s4 + s0
Translation e1 e2 e 3 e4 e5 e 6 e 7
Bertoldi
e2 # e3 # e4
SLT based on Confusion networks
f 41
f 51
f 61
f 22
f 42
ε
f 62
ε
f 43
f 11
f 21
f 12
f 31
OpenLab 2006
source
CN
f 63
18
Decoder
Bertoldi
SLT based on Confusion networks
OpenLab 2006
19
Decoder
• Generative translation process
• Synchronous on output phrases
Bertoldi
SLT based on Confusion networks
OpenLab 2006
19
Decoder
• Generative translation process
• Synchronous on output phrases
• Dynamic programming
• Beam search: deletion of less promising partial translations
Bertoldi
SLT based on Confusion networks
OpenLab 2006
19
Decoder
• Generative translation process
• Synchronous on output phrases
• Dynamic programming
• Beam search: deletion of less promising partial translations
• Reordering constraints: reduction of possible alignments
• Lexicon pruning: no more than 30 translation per phrase
• Confusion network pruning: removal of less confident words
Bertoldi
SLT based on Confusion networks
OpenLab 2006
19
Decoder
• Generative translation process
• Synchronous on output phrases
• Dynamic programming
• Beam search: deletion of less promising partial translations
• Reordering constraints: reduction of possible alignments
• Lexicon pruning: no more than 30 translation per phrase
• Confusion network pruning: removal of less confident words
• Word graph generation: representation of the whole search space
• N -best extraction: multiple translations
Bertoldi
SLT based on Confusion networks
OpenLab 2006
19
N -best-based SLT system
Bertoldi
SLT based on Confusion networks
OpenLab 2006
20
N -best-based SLT system
• relies on a text-based decoder: simplified verison of the CN-based decoder
Bertoldi
SLT based on Confusion networks
OpenLab 2006
20
N -best-based SLT system
• relies on a text-based decoder: simplified verison of the CN-based decoder
• translates separately all N -best transcriptions
Bertoldi
SLT based on Confusion networks
OpenLab 2006
20
N -best-based SLT system
• relies on a text-based decoder: simplified verison of the CN-based decoder
• translates separately all N -best transcriptions
• adds acoustic and source LM scores provided with the N -best transcriptions
Bertoldi
SLT based on Confusion networks
OpenLab 2006
20
N -best-based SLT system
• relies on a text-based decoder: simplified verison of the CN-based decoder
• translates separately all N -best transcriptions
• adds acoustic and source LM scores provided with the N -best transcriptions
• reranks the outputs
Bertoldi
SLT based on Confusion networks
OpenLab 2006
20
Evaluation
• Shared Task T3: integration of ASR and MT
• Input: human, automatic, N -best, Confusion Networks
• Automatic evaluation: BLEU score, case insensitive
Sentences
Running words
Vocabulary
best transcription WER
Bertoldi
SLT based on Confusion networks
Train
Dev
Test
1,2M
2,643
1,073
31M
30M
20K
23K
18.9K
19.3K
140K
94K
2.9K
2.6K
3.3K
2.8K
—
11.77%
OpenLab 2006
14.90%
21
Results
DEV
BLEU
input
human
Bertoldi
TEST
size
WER
1
0
45.78
time
input
size
WER
1
0
0.6
SLT based on Confusion networks
BLEU
time
40.84
1.7
OpenLab 2006
22
Results
DEV
BLEU
input
size
WER
human
1
0
45.78
1-bst
1
11.77
40.17
Bertoldi
• 10% decrement due to ASR
TEST
time
input
BLEU
time
size
WER
0.6
1
0
40.84
1.7
0.6
1
14.60
36.64
2.1
SLT based on Confusion networks
• comparable to ASR WER
OpenLab 2006
22
Results
DEV
BLEU
input
• 10% decrement due to ASR
TEST
size
WER
human
1
0
45.78
1-bst
1
11.77
5-bst
4
10-bst
time
input
BLEU
time
size
WER
0.6
1
0
40.84
1.7
40.17
0.6
1
14.60
36.64
2.1
8.12
40.63
2.8
5
11.90
36.47
10.5
8
6.99
40.83
5.3
9
11.02
36.75
20.4
20-bst
13
6.19
41.03
9.8
16
10.20
36.55
38.9
50-bst
25
5.40
40.85
20.6
34
9.47
36.66
84.2
100-bst
38
5.07
40.87
33.2
56
9.09
36.68
135.3
Bertoldi
SLT based on Confusion networks
• comparable to ASR WER
• few transcriptions
• difficult to improve
OpenLab 2006
22
Results
DEV
BLEU
input
• 10% decrement due to ASR
TEST
size
WER
human
1
0
45.78
1-bst
1
11.77
5-bst
4
10-bst
time
input
BLEU
time
size
WER
0.6
1
0
40.84
1.7
40.17
0.6
1
14.60
36.64
2.1
8.12
40.63
2.8
5
11.90
36.47
10.5
8
6.99
40.83
5.3
9
11.02
36.75
20.4
20-bst
13
6.19
41.03
9.8
16
10.20
36.55
38.9
50-bst
25
5.40
40.85
20.6
34
9.47
36.66
84.2
100-bst
38
5.07
40.87
33.2
56
9.09
36.68
135.3
cn-p00
1
11.67
40.30
4.0
1
14.46
36.54
28.4
cn-p50
4
9.42
41.06
5.8
32
11.86
37.14
31.2
cn-p55
13
8.93
41.21
6.3
150
11.32
37.23
34.7
cn-p60
194
8.41
41.24
6.7
1,284
10.71
37.21
37.9
cn-p65
1,359
7.91
41.21
7.4
9,816
10.16
37.05
43.9
cn-p70
15,056
7.53
41.23
27.4
228,461
9.71
37.14
54.6
Bertoldi
SLT based on Confusion networks
• comparable to ASR WER
• few transcriptions
• difficult to improve
• CN slightly better than N -bst
• CN contains more hypotheses
• higher ASR WER
• CN is more efficient
OpenLab 2006
22
Future plan
Bertoldi
SLT based on Confusion networks
OpenLab 2006
23
Future plan
• generation of richer CNs
– with lower WER
– with limited size
Bertoldi
SLT based on Confusion networks
OpenLab 2006
23
Future plan
• generation of richer CNs
– with lower WER
– with limited size
• introduction of other features related to input:
– source LM: reliability of a path
Bertoldi
SLT based on Confusion networks
OpenLab 2006
23
Future plan
• generation of richer CNs
– with lower WER
– with limited size
• introduction of other features related to input:
– source LM: reliability of a path
• experiment on a more difficult task (higher ASR WER)
Bertoldi
SLT based on Confusion networks
OpenLab 2006
23
Future plan
• generation of richer CNs
– with lower WER
– with limited size
• introduction of other features related to input:
– source LM: reliability of a path
• experiment on a more difficult task (higher ASR WER)
• decoding the whole ASR WG
Bertoldi
SLT based on Confusion networks
OpenLab 2006
23
Thanks for your attention!
Bertoldi
SLT based on Confusion networks
OpenLab 2006
24
References
[1] Ney, “Speech Translation: Coupling of Recognition and Translation”. ICASSP 1999.
[2] Bangalore and Riccardi, “Stochastic finite-state models for spoken language machine translation”. Machine Translation, 17(3),
2002.
[3] Zhang et al., “A Unified Approach in Speech-to-Speech Translation: Integrating Features of Speech Recognition and Machine
Translation”. COLING 2004.
[4] Casacuberta et al., “Some approaches to statistical and finite-state speech-to-speech translation”. Computer Speech and
Language, 18, 2004.
[5] Mangu et al., “Finding consensus among words: Lattice-based word error minimization”. ISCA ECSCT 1999.
[6] Quan et al., “Integrated n-best re-ranking for spoken language translation”. Interspeech 2005.
[7] Cettolo et al., “A look inside the ITC-irst SMT system”. MT Summit X 2005.
[8] Federico and Bertoldi, “A word-to-phrase statistical translation model”. Transactions on Speech and Language Processing.
2(2). 2005.
[9] Bertoldi and Federico, “A New Decoder for Spoken Language Translation based on Confusion Networks”. ASRU 2005.
Bertoldi
SLT based on Confusion networks
OpenLab 2006
25
ITC-irst SLT system Architecture
: extractor (from WG)
1−best
text
MT
ASR
WG
N−best
best
solution
WG
N−best
conf.
network
speech
signal
Bertoldi
speech
decoding
speech
hypotheses
SLT based on Confusion networks
Rescoring
confusion
network
MT
translation
decoding
translation
hypotheses
OpenLab 2006
26
ITC-irst SLT system Architecture
• different input types: text, N -best, Confusion Networks
: extractor (from WG)
1−best
text
MT
ASR
WG
N−best
best
solution
WG
N−best
conf.
network
speech
signal
Bertoldi
speech
decoding
speech
hypotheses
SLT based on Confusion networks
Rescoring
confusion
network
MT
translation
decoding
translation
hypotheses
OpenLab 2006
26
ITC-irst SLT system Architecture
• different input types: text, N -best, Confusion Networks
• two-step decoder
: extractor (from WG)
1−best
text
MT
ASR
WG
N−best
best
solution
WG
N−best
conf.
network
speech
signal
Bertoldi
speech
decoding
speech
hypotheses
SLT based on Confusion networks
Rescoring
confusion
network
MT
translation
decoding
translation
hypotheses
OpenLab 2006
26
ITC-irst SLT system Architecture
• different input types: text, N -best, Confusion Networks
• two-step decoder
• rescoring with additional features
: extractor (from WG)
1−best
text
MT
ASR
WG
N−best
best
solution
WG
N−best
conf.
network
speech
signal
Bertoldi
speech
decoding
speech
hypotheses
SLT based on Confusion networks
Rescoring
confusion
network
MT
translation
decoding
translation
hypotheses
OpenLab 2006
26
ITC-irst SLT system Architecture
• different input types: text, N -best, Confusion Networks
• two-step decoder
• rescoring with additional features
• reranking with optimized weights
: extractor (from WG)
1−best
text
MT
ASR
WG
N−best
best
solution
WG
N−best
conf.
network
speech
signal
Bertoldi
speech
decoding
speech
hypotheses
SLT based on Confusion networks
Rescoring
confusion
network
MT
translation
decoding
translation
hypotheses
OpenLab 2006
26
Scarica

A New Spoken Language Translation Decoder Based on - TC-Star