A New SLT Decoder based on Confusion Networks Nicola Bertoldi and Marcello Federico ITC-irst - Centro per la Ricerca Scientifica e Tecnologica I-38050 Povo (Trento), Italy {bertoldi,federico}@itc.it Bertoldi SLT based on Confusion networks OpenLab 2006 1 Outline Bertoldi SLT based on Confusion networks OpenLab 2006 2 Outline • Spoken Language Translation Bertoldi SLT based on Confusion networks OpenLab 2006 2 Outline • Spoken Language Translation • Approaches Bertoldi SLT based on Confusion networks OpenLab 2006 2 Outline • Spoken Language Translation • Approaches • Confusion Network (CN) Bertoldi SLT based on Confusion networks OpenLab 2006 2 Outline • Spoken Language Translation • Approaches • Confusion Network (CN) • CN-based Translation Model Bertoldi SLT based on Confusion networks OpenLab 2006 2 Outline • Spoken Language Translation • Approaches • Confusion Network (CN) • CN-based Translation Model • CN-based Decoder Bertoldi SLT based on Confusion networks OpenLab 2006 2 Outline • Spoken Language Translation • Approaches • Confusion Network (CN) • CN-based Translation Model • CN-based Decoder • Evaluation Bertoldi SLT based on Confusion networks OpenLab 2006 2 Spoken Language Translation Bertoldi SLT based on Confusion networks OpenLab 2006 3 Spoken Language Translation • Translation of speech input – spontaneous speech phenomena: repetitions, hesitations – recognition errors: syntax, meaning Bertoldi SLT based on Confusion networks OpenLab 2006 3 Spoken Language Translation 42.5 42 • Translation of speech input repetitions, hesitations – recognition errors: syntax, meaning BLEU SCORE – spontaneous speech phenomena: 41.5 41 40.5 40 39.5 39 38.5 14 15 16 17 18 WER OF TRANSCRIPTIONS 19 20 21 • Automatic Speech Recognition and Machine Translation – strong correlation between recognition and translation quality – ASR WER decreases in a set of hypotheses Bertoldi SLT based on Confusion networks OpenLab 2006 3 Spoken Language Translation 42.5 42 • Translation of speech input repetitions, hesitations – recognition errors: syntax, meaning BLEU SCORE – spontaneous speech phenomena: 41.5 41 40.5 40 39.5 39 38.5 14 15 16 17 18 WER OF TRANSCRIPTIONS 19 20 21 • Automatic Speech Recognition and Machine Translation – strong correlation between recognition and translation quality – ASR WER decreases in a set of hypotheses – idea: exploitation of more transcriptions Bertoldi SLT based on Confusion networks OpenLab 2006 3 Statistical Spoken Language Translation Bertoldi SLT based on Confusion networks OpenLab 2006 4 Statistical Spoken Language Translation Given a speech input o in the source language, and the set F(o) of its possible transcriptions, find the best translation through the following approximate criterion: e∗ = arg max Pr(e | o) ≈ arg max max Pr(e, f | o) e e f ∈F (o) Bertoldi SLT based on Confusion networks OpenLab 2006 4 Statistical Spoken Language Translation Given a speech input o in the source language, and the set F(o) of its possible transcriptions, find the best translation through the following approximate criterion: e∗ = arg max Pr(e | o) ≈ arg max max Pr(e, f | o) e e f ∈F (o) • Pr(e, f | o) speech translation model – acoustic and translation features Bertoldi SLT based on Confusion networks OpenLab 2006 4 Statistical Spoken Language Translation Given a speech input o in the source language, and the set F(o) of its possible transcriptions, find the best translation through the following approximate criterion: e∗ = arg max Pr(e | o) ≈ arg max max Pr(e, f | o) e e f ∈F (o) 0 @BG @BG è @BG 256 255 257 258 @BG è è 259 è @BG @BG 260 @BG 261 1 @BG è era 262 è era @BG • Pr(e, f | o) speech translation model è è è @BG era @BG è 3 era 264 è 2 era 263 @BG era 4 @BG @BG 265 era 5 @BG era 266 era @BG era 6 @BG @BG 267 era 7 era @BG @BG 268 era era 8 vacanza @BG 269 vacanza @BG era vacanza 270 @BG vacanza 10 @BG vacanza vacanza era @BG 271 vacanza era 9 era 11 @BG @BG 272 vacanza 12 @BG @BG 273 – acoustic and translation features 13 @BG la 274 @BG la 275 14 @BG 300 la la @BG 299 l’ @BG @BG 277 @BG la l’ 305 19 cancello 282 @BG l’ cancello @BG @BG @BG 287 25 @BG di imbarco 288 @BG imbarco 314 d’ imbarco d’ 39 @BG 320 58 @BG 59 60 imbarco imbarco imbarco imbarco @BG 324 imbarco imbarco 44 imbarco imbarco 66 imbarco 46 imbarco imbarco @BG 327 73 imbarco 48 @BG imbarco @BG 75 imbarco o 49 50 77 @BG @BG 51 ho @BG ho 53 ho – complex structure 91 hot o @BG o o @BG 143 @BG i n’ 245 247 n d’ @BG 235 @BG </s> 227 @BG i 240 @BG c 239 248 236 </s> </s> 244 </s> </s> o </s> a 93 @BG ha ha @BG </s> @BG 136 ad @BG è 121 192 @BG da @BG a 82 79 @BG 191 @BG @BG up @BG 95 uh uh 105 @BG @BG </s> </s> 106 @BG 138 </s> </s> 176 183 a @BG @BG @BG 104 213 @BG @BG 182 214 @BG @BG 207 185 @BG 184 ah @BG 194 81 85 @BG </s> @BG @BG @BG </s> 111 @BG </s> 109 </s> 97 </s> </s> </s> </s> </s> </s> 208 </s> </s> </s> 215 186 </s> 221 108 107 115 </s> 220 @BG a @BG 206 ah @BG @BG @BG 218 175 193 p 102 103 129 </s> a @BG ai à 110 @BG 123 un un a 147 à @BG 96 </s> </s> 177 a @BG 119 </s> 201 80 a u u @BG 114 là 219 la @BG 212 @BG 94 145 113 là @BG @BG 211 @BG @BG 84 @BG 112 @BG @BG @BG è 209 @BG 210 la @BG 174 e @BG 69 la la @BG 217 @BG up là la @BG 216 è da u @BG 137 l’ @BG 122 la ma ma @BG 173 144 un’ 124 ma 72 74 è 190 @BG 128 @BG @BG @BG a 146 ad 120 @BG è 200 in 89 l’ 162 </s> </s> </s> @BG @BG @BG 169 </s> 254 </s> @BG un @BG è e 71 @BG @BG @BG 161 − @BG all’ @BG e 199 189 all’ 70 198 @BG e @BG un’ 83 87 154 @BG 233 ad 163 </s> </s> un @BG a 252 @BG 116 e 117 l’ a 78 −@BG 118 141 101 @BG 92 − − @BG 153 @BG 160 @BG ha 151 @BG 152 @BG @BG @BG 88 @BG a 195 @BG i 196 253 @BG b è un’ 197 @BG ha e 188 un 159 al 149 un il 140 158 @BG @BG 230 </s> @BG a @BG 100 di 229 e 250 @BG si 99 224 @BG @BG i 234 di @BG 251 − 150 è a 180 è 181 è o @BG 156 @BG 139 @BG @BG e b 232 @BG 238 242 </s> @BG @BG e 179 98 @BG 205 223 @BG @BG 231 241 @BG </s> </s> @BG @BG 243 246 c 237 k @BG @BG e 249 @BG 228 135 c ha 157 @BG 203 @BG i 222 i 225 @BG 178 @BG 204 @BG 127 @BG @BG il 202 ai 168 @BG 126 @BG 226 al ha 148 @BG 166 è al 155 @BG @BG 90 o e 167 @BG 125 @BG 171 172 @BG @BG </s> 142 @BG 170 131 133 @BG 134 @BG 57 oh 130 @BG 132 @BG 54 @BG 86 ho o 165 ho a 76 ho@BG a 187 @BG 164 @BG 68 @BG @BG imbarco @BG @BG 67 imbarco 47 @BG è imbarco imbarco 52 @BG imbarco @BG 55 65 imbarco 45 @BG 56 @BG imbarco imbarco 326 @BG imbarco 64 imbarco @BG imbarco imbarco @BG imbarco @BG @BG imbarco 63 imbarco 43 </s> imbarco 62 @BG 42 325 61 @BG imbarco @BG @BG </s> di @BG imbarco 41 imbarco 298 323 @BG di 32 imbarco @BG @BG con di @BG @BG 40 @BG 322 @BG di 31 imbarco 297 con con di @BG d’ imbarco @BG 296 @BG 321 30 d’ @BG 36 @BG 37 38 imbarco 295 con di @BG 294 @BG @BG 29 @BG d’ 35 imbarco 293 @BG bar @BG 33 @BG imbarco @BG @BG bar d’ 34 @BG 291 292 bar @BG 27 @BG 290 imbarco 315 @BG 316 di 28 @BG imbarco @BG 26 @BG 289 @BG – huge amount of transcription hypotheses @BG imbarco imbarco 313 con 24 imbarco di 319 @BG imbarco @BG cancello cancello cancello 286 @BG cancello 22 23 @BG 311 318 cancello 21 cancello @BG 285 di @BG 317 cancello @BG 283 @BG 284 308 @BG 312 • F(o) is an ASR word graph (WG): 20 @BG 306 @BG @BG 310 bar @BG @BG @BG di bar 18 l’ 281 l’ 307 di di @BG 280 @BG 309 di @BG 17 @BG l’ la 16 @BG 279 l’ la @BG @BG 278 l’ l’ @BG @BG 304 di @BG 276 @BG 302 303 15 l’ 301 la la la </s> </s> 328 Bertoldi SLT based on Confusion networks OpenLab 2006 4 Approaches Bertoldi SLT based on Confusion networks OpenLab 2006 5 Approaches • 1-best Decoder: a text MT system only translates the best transcription of the ASR. No use of multiple transcriptions. Bertoldi SLT based on Confusion networks OpenLab 2006 5 Approaches • 1-best Decoder: a text MT system only translates the best transcription of the ASR. No use of multiple transcriptions. • N -best Decoder: N hypotheses are translated by a text MT decoder and reranked according to ASR scores, e.g. of acoustic and source language models. It does not advantage from overlaps among N -best. Bertoldi SLT based on Confusion networks OpenLab 2006 5 Approaches • 1-best Decoder: a text MT system only translates the best transcription of the ASR. No use of multiple transcriptions. • N -best Decoder: N hypotheses are translated by a text MT decoder and reranked according to ASR scores, e.g. of acoustic and source language models. It does not advantage from overlaps among N -best. • Finite State Transducer: both ASR and MT models are merged into one finite-state network and a transducer decodes the input speech signal in one shot. Difficult scaling up to large domains. Bertoldi SLT based on Confusion networks OpenLab 2006 5 Approaches • 1-best Decoder: a text MT system only translates the best transcription of the ASR. No use of multiple transcriptions. • N -best Decoder: N hypotheses are translated by a text MT decoder and reranked according to ASR scores, e.g. of acoustic and source language models. It does not advantage from overlaps among N -best. • Finite State Transducer: both ASR and MT models are merged into one finite-state network and a transducer decodes the input speech signal in one shot. Difficult scaling up to large domains. • Confusion Network Decoder: an approximate WG is extracted from the ASR output and is directly translated. It exploits overlaps among hypotheses. Bertoldi SLT based on Confusion networks OpenLab 2006 5 SLT system 1−best "hola mundo" ASR Word Graph N−best MT "hello world" CN speech signal Bertoldi speech decoding transcription hypotheses SLT based on Confusion networks filter MT input OpenLab 2006 MT decoding best translation 6 Confusion Network Bertoldi SLT based on Confusion networks OpenLab 2006 7 Confusion Network 0 @BG @BG è @BG 256 257 258 @BG è è 259 è @BG 0 @BG 260 @BG 261 1 @BG è è era @BG è è è @BG era @BG 1 3 era 264 è 2 era 263 è <s> era 262 @BG era 4 @BG @BG 265 era 5 @BG era 266 era è eps era @BG era 6 @BG @BG 267 era 7 @BG era era 8 @BG 269 vacanza @BG era vacanza 270 @BG 10 vacanza vacanza era 11 @BG @BG 272 vacanza @BG 13 274 @BG 299 la @BG 300 la @BG 277 @BG la la l’ @BG di imbarco @BG imbarco @BG 26 @BG d’ 289 imbarco imbarco @BG 291 imbarco imbarco d’ @BG 39 @BG con 60 43 imbarco @BG imbarco imbarco imbarco imbarco 75 imbarco o 49 @BG ho ho 53 ho @BG @BG o 55 @BG 134 245 247 235 @BG </s> d’ @BG i 240 @BG 127 c 239 243 248 236 </s> </s> 241 @BG </s> 244 </s> @BG 238 o 234 di 181 b è si 161 − 116 @BG 153 − 252 @BG </s> @BG l’ 162 @BG ad @BG è @BG 121 @BG 118 u @BG 137 @BG @BG 145 a u u 95 @BG 122 uh uh 105 @BG @BG 114 à 96 106 @BG @BG @BG @BG ah @BG 104 @BG 182 @BG @BG 207 eps 185 @BG 194 e o a i ho ha c bl’ k s t 184 14 81 107 85 @BG </s> @BG </s> </s> @BG </s> </s> è 108 103 109 97 13 221 214 ah @BG eps ci @BG 213 @BG a 111 </s> </s> 220 @BG 176 183 @BG @BG </s> </s> la 218 a @BG 115 </s> 119 </s> p 102 110 @BG </s> </s> a 175 206 @BG ai à 12 @BG @BG un un a 147 129 </s> a @BG 138 177 193 @BG @BG @BG 123 @BG 201 80 @BG là 219 212 @BG @BG 84 94 113 @BG up @BG là @BG 211 @BG 174 è 79 @BG 209 @BG 217 @BG e @BG 82 69 @BG 210 la eps f là la la la @BG è up da a 112 l’ @BG la ma @BG 216 173 da @BG 191 128 @BG ma 72 ma @BG @BG 190 @BG 144 un’ 124 è @BG 74 a 146 192 ad 120 163 </s> </s> @BG un @BG </s> 233 68 e 71 @BG è 200 in 89 @BG @BG 136 è @BG 199 e @BG all’ all’ @BG 70 e 198 @BG 189 un’ 83 </s> 254 </s> ad @BG @BG a a 78 un 87 154 169 </s> </s> e 117 l’ 141 101 @BG @BG − 152 197 @BG ha 151 @BG −@BG 160 92 @BG @BG @BG 229 è ha @BG @BG 253 @BG ha 88 a 93 e 188 un 159 al @BG @BG ha @BG b 250 </s> @BG @BG @BG e a @BG i 158 100 di un’ 195 196 99 @BG 251 o @BG 149 un il 140 a @BG 230 </s> @BG a 224 @BG @BG i è 180 è − ha 148 @BG 156 @BG 150 @BG @BG e 232 @BG e 98 @BG @BG 139 179 205 e ha 157 @BG 249 @BG 242 </s> c </s> @BG @BG @BG 223 @BG @BG 231 @BG 203 222 i @BG </s> @BG 237 k @BG 246 c @BG 204 i 225 i 227 @BG ai @BG @BG @BG 226 @BG 228 n il 178 168 126 @BG 172 @BG n’ @BG 143 al al 155 @BG @BG 166 è 202 125 @BG @BG </s> @BG o 165 ho 90 o e 167 171 @BG 135 142 170 131 133 @BG 56 57 oh o o @BG 54 @BG hot a 76 ho@BG 77 @BG 86 ho 130 132 @BG </s> 91 a 187 @BG 164 @BG 50 @BG 51 @BG t e eps i un @BG 67 @BG @BG 73 @BG imbarco @BG a 11 è imbarco imbarco 48 52 10 b imbarco imbarco @BG upahall’ da−a. ai à art ha a.a. aa ora ha−ha a.c. abc qua anch’ @BG imbarco @BG 47 là 66 imbarco 46 il 65 imbarco @BG imbarco imbarco ma @BG imbarco 45 @BG 327 l’ un’ 64 imbarco imbarco 326 ad imbarco imbarco imbarco @BG ho la imbarco @BG 44 al imbarco 63 imbarco @BG a imbarco 62 @BG 324 @BG 61 @BG @BG 325 eps imbarco imbarco imbarco imbarco @BG </s> 9 @BG 298 @BG 8 eps ha di 59 imbarco 323 con ohincsiheopnattoco’lo hot per se on cc ck rock w of eh hop ed del p. s g huh su l’ non y f air @BG @BG 42 @BG di d’ 58 imbarco 41 imbarco u k uh n n’ @BG imbarco @BG @BG 322 c di 32 imbarco 40 @BG 297 con con conva di @BG imbarco @BG 296 @BG o di 31 d’ imbarco 295 @BG 320 @BG d’ @BG 37 38 @BG 319 e di @BG 36 è di 30 @BG 294 321 29 @BG d’ 35 @BG 292 293 @BG bar @BG 33 @BG bar d’ eps @BG 27 @BG 34 @BG 315 @BG @BG di 28 @BG 290 bar con 25 288 imbarco imbarco imbarco 7 @BG 287 @BG di 316 bar 24 imbarco @BG eps− @BG 286 imbarco @BG cancello cancello cancello di 314 6 cancello @BG @BG 318 cancello 21 22 23 @BG 317 cancello @BG 313 bar 20 cancello cancello 311 312 imbarco bar eps cancello l’ @BG 310 (Mangu 1999) 19 @BG @BG 283 @BG @BG 285 @BG di di @BG 284 308 di 18 l’ 282 @BG 306 @BG di 5 @BG 281 l’ 305 @BG 307 di di @BG 17 280 @BG 309 d’ l’ all’ eps 16 @BG 279 @BG l’ @BG @BG 278 l’ l’ l’ @BG a WG by shrinking into a unifilar WG @BG 276 la 304 15 l’ 301 @BG 303 la 4 14 l’ @BG 275 la @BG 302 @BG eps la 12 @BG 273 @BG la la 3 @BG 271 vacanza cancello vacanza eps era 9 era vacanza @BG la 2 era @BG 268 vacanza • A Confusion Network (CN) approximates Confusion Network Word Graph 255 </s> </s> 15 </s> </s> </s> @BG 208 </s> </s> </s> </s> 215 186 </s> </s> </s> 16 328 Bertoldi SLT based on Confusion networks OpenLab 2006 7 Confusion Network 0 @BG @BG è @BG 256 257 258 @BG è è 259 è @BG 0 @BG 260 @BG 261 1 @BG è è era @BG è è è @BG era @BG 1 3 era 264 è 2 era 263 è <s> era 262 @BG era 4 @BG @BG 265 era 5 @BG era 266 era è eps era @BG era 6 @BG @BG 267 era 7 @BG era era 8 @BG 269 vacanza @BG era vacanza 270 @BG 10 vacanza vacanza era 11 @BG @BG 272 vacanza @BG 13 274 @BG 299 la @BG 300 la @BG 277 @BG la la l’ di imbarco @BG imbarco @BG 26 @BG d’ 289 imbarco imbarco @BG 291 imbarco imbarco d’ @BG 39 @BG con 60 43 imbarco @BG imbarco imbarco imbarco imbarco 75 imbarco o 49 @BG ho ho 53 ho @BG @BG o 55 @BG 134 245 247 235 @BG </s> d’ @BG i 240 @BG 127 c 239 243 248 236 </s> </s> 241 @BG </s> 244 </s> @BG 181 b 238 o è si @BG @BG è 161 − 116 @BG 197 153 − 252 233 @BG </s> @BG l’ 162 @BG ad @BG è @BG 121 @BG 118 u @BG 137 @BG @BG 145 a u u 95 @BG 122 uh uh 105 @BG @BG 114 147 à 96 106 @BG @BG 110 @BG 104 13 221 214 @BG @BG 207 eps 185 @BG 194 103 107 @BG o a i ho ha c bl’ k s t </s> @BG </s> </s> @BG </s> </s> e 14 81 85 109 97 è 184 ah @BG 111 </s> </s> ah @BG 182 @BG @BG </s> </s> </s> @BG @BG eps ci @BG 213 @BG a 115 </s> 119 220 @BG 176 183 108 @BG </s> </s> 129 </s> p 102 la 218 a @BG 206 @BG ai à a 175 un a 12 @BG @BG un a @BG 138 177 193 @BG @BG @BG 123 @BG 201 80 @BG là 219 212 @BG @BG 84 94 113 @BG up @BG là @BG 211 @BG 174 è 79 @BG 209 @BG 217 @BG e @BG 82 69 @BG 210 la eps f là la la la @BG è up da a 112 l’ @BG la ma @BG 216 173 da @BG 191 128 @BG ma 72 ma @BG @BG 190 @BG 144 un’ 124 è @BG 74 a 146 192 ad 120 163 </s> </s> @BG un @BG 254 </s> 68 e 71 @BG è 200 in 89 @BG @BG 136 è @BG 199 e @BG all’ all’ @BG 70 e 198 @BG 189 un’ 83 </s> </s> 230 ad @BG @BG 169 a a 78 un 87 154 @BG </s> </s> e 117 l’ 141 101 @BG @BG − @BG @BG @BG ha 151 @BG −@BG 160 92 152 253 @BG 250 </s> ha @BG @BG b ha 88 a 93 e 188 un 159 al @BG @BG ha @BG 229 e a @BG i 196 158 100 di un’ 195 a 99 @BG 251 o @BG 149 un il 140 @BG @BG 234 </s> @BG a 224 @BG @BG i è 180 di − ha 148 @BG 156 @BG 150 @BG @BG è 249 @BG 139 e 98 e 232 @BG @BG 205 e ha 157 179 @BG @BG 242 </s> c </s> @BG @BG @BG 223 @BG @BG 231 @BG 203 222 i @BG </s> @BG 237 k @BG 246 c @BG 204 i 225 i 227 @BG ai @BG @BG @BG 226 @BG 228 n il 178 168 126 @BG 172 @BG n’ @BG 143 al al 155 @BG @BG 166 è 202 125 @BG @BG </s> @BG o 165 ho 90 o e 167 171 @BG 135 142 170 131 133 @BG 56 57 oh o o @BG 54 @BG hot a 76 ho@BG 77 @BG 86 ho 130 132 @BG </s> 91 a 187 @BG 164 @BG 50 @BG 51 @BG t e eps i un @BG 67 @BG @BG 73 @BG imbarco @BG a 11 è imbarco imbarco 48 52 10 b imbarco imbarco @BG upahall’ da−a. ai à art ha a.a. aa ora ha−ha a.c. abc qua anch’ @BG imbarco @BG 47 là 66 imbarco 46 il 65 imbarco @BG imbarco imbarco ma @BG imbarco 45 @BG 327 l’ un’ 64 imbarco imbarco 326 ad imbarco imbarco imbarco @BG ho la imbarco @BG 44 al imbarco 63 imbarco @BG a imbarco 62 @BG 324 @BG 61 @BG @BG 325 eps imbarco imbarco imbarco imbarco @BG </s> 9 @BG 298 @BG 8 eps ha di 59 imbarco 323 con ohincsiheopnattoco’lo hot per se on cc ck rock w of eh hop ed del p. s g huh su l’ non y f air @BG @BG 42 @BG di d’ 58 imbarco 41 imbarco u k uh n n’ @BG imbarco @BG @BG 322 c di 32 imbarco 40 @BG 297 con con conva di @BG imbarco @BG 296 @BG o di 31 d’ imbarco 295 @BG 320 @BG d’ @BG 37 38 @BG 319 e di @BG 36 è di 30 @BG 294 321 29 @BG d’ 35 @BG 292 293 @BG bar @BG 33 @BG bar d’ eps @BG 27 @BG 34 @BG 315 @BG bar di 28 @BG 290 316 con 25 288 imbarco imbarco imbarco 7 @BG 287 @BG di @BG eps− 24 imbarco @BG cancello @BG 286 imbarco @BG 6 cancello cancello cancello di 314 bar • Representation through a compact table @BG @BG @BG 318 cancello 21 22 23 @BG 317 cancello @BG 313 bar 20 cancello cancello 311 312 imbarco bar eps cancello l’ @BG 310 (Mangu 1999) 19 @BG @BG 283 @BG @BG 285 @BG di di @BG 284 308 di 18 l’ 282 @BG 306 @BG di 5 @BG 281 l’ 305 @BG 307 di di @BG 17 280 @BG 309 d’ l’ all’ eps 16 @BG 279 @BG l’ @BG @BG 278 l’ l’ l’ @BG a WG by shrinking into a unifilar WG @BG 276 la 304 15 l’ 301 @BG 303 la 4 14 l’ @BG 275 la @BG 302 @BG eps la 12 @BG 273 @BG la la 3 @BG 271 vacanza cancello vacanza eps era 9 era vacanza @BG la 2 era @BG 268 vacanza • A Confusion Network (CN) approximates Confusion Network Word Graph 255 </s> </s> 15 </s> </s> </s> @BG 208 </s> </s> </s> </s> 215 186 </s> 16 </s> </s> 328 era 0.997 cancello 0.995 0.999 di 0.615 imbarco 0.999 è 0.002 vacanza 0.004 la 0.001 d’ 0.376 bar 0.001 0.001 0.002 ... l’ 0.002 ... 0.001 Bertoldi SLT based on Confusion networks OpenLab 2006 7 Confusion Network 0 @BG @BG è @BG 256 257 258 @BG è è 259 è @BG 0 @BG 260 @BG 261 1 @BG è è era @BG è è è @BG era @BG 1 3 era 264 è 2 era 263 è <s> era 262 @BG era 4 @BG @BG 265 era 5 @BG era 266 era è eps era @BG era 6 @BG @BG 267 era 7 @BG era era 8 @BG 269 vacanza @BG era vacanza 270 @BG 10 vacanza vacanza era 11 @BG @BG 272 vacanza @BG 13 274 @BG 299 la @BG 300 la @BG 277 @BG la la l’ di imbarco @BG imbarco @BG 26 @BG d’ 289 imbarco imbarco @BG 291 imbarco imbarco d’ @BG 39 @BG con 60 43 imbarco @BG imbarco imbarco imbarco imbarco 75 imbarco o 49 @BG • Each path corresponds to a hypothesis ho ho 53 ho @BG @BG 55 @BG 134 245 247 235 @BG </s> 236 </s> </s> c 127 239 244 </s> @BG 181 b 238 o è si @BG @BG è 161 − 116 @BG 197 153 − 252 233 @BG </s> @BG l’ 162 @BG ad @BG è @BG 121 @BG 118 u @BG 137 @BG @BG 145 a u u 95 @BG 122 uh uh 105 @BG @BG 114 147 à 96 106 @BG @BG 110 @BG 104 13 221 214 @BG @BG 207 eps 185 @BG 194 103 107 @BG o a i ho ha c bl’ k s t </s> @BG </s> </s> @BG </s> </s> e 14 81 85 109 97 è 184 ah @BG 111 </s> </s> ah @BG 182 @BG @BG </s> </s> </s> @BG @BG eps ci @BG 213 @BG a 115 </s> 119 220 @BG 176 183 108 @BG </s> </s> 129 </s> p 102 la 218 a @BG 206 @BG ai à a 175 un a 12 @BG @BG un a @BG 138 177 193 @BG @BG @BG 123 @BG 201 80 @BG là 219 212 @BG @BG 84 94 113 @BG up @BG là @BG 211 @BG 174 è 79 @BG 209 @BG 217 @BG e @BG 82 69 @BG 210 la eps f là la la la @BG è up da a 112 l’ @BG la ma @BG 216 173 da @BG 191 128 @BG ma 72 ma @BG @BG 190 @BG 144 un’ 124 è @BG 74 a 146 192 ad 120 163 </s> </s> @BG un @BG 254 </s> 68 e 71 @BG è 200 in 89 @BG @BG 136 è @BG 199 e @BG all’ all’ @BG 70 e 198 @BG 189 un’ 83 </s> </s> 230 ad @BG @BG 169 a a 78 un 87 154 @BG </s> </s> e 117 l’ 141 101 @BG @BG − @BG @BG @BG ha 151 @BG −@BG 160 92 152 253 @BG 250 </s> ha @BG @BG b ha 88 a 93 e 188 un 159 al @BG @BG ha @BG 229 e a @BG i 196 158 100 di un’ 195 a 99 @BG 251 o @BG 149 un il 140 @BG @BG 234 </s> @BG a 224 @BG @BG i è 180 di − ha 148 @BG 156 @BG 150 @BG @BG è 249 @BG 139 e 98 e 232 @BG @BG 205 e ha 157 179 @BG @BG 242 </s> c </s> @BG @BG @BG 223 @BG @BG 231 @BG 203 222 i @BG </s> @BG 241 @BG </s> c 237 k 243 248 • Posterior probs for single words i @BG 204 i 225 i 227 @BG 240 @BG @BG 246 • Possible insertion of words d’ @BG ai @BG @BG @BG 226 @BG n il 178 168 126 @BG 172 @BG n’ @BG 143 al al 155 @BG @BG 166 è 202 125 @BG @BG </s> @BG o 165 ho 90 o e 167 171 @BG 135 142 170 131 133 228 • CN contains more paths than WG o @BG 56 57 oh o o @BG 54 @BG hot a 76 ho@BG 77 @BG 86 ho 130 132 @BG </s> 91 a 187 @BG 164 @BG 50 @BG 51 @BG t e eps i un @BG 67 @BG @BG 73 @BG imbarco @BG a 11 è imbarco imbarco 48 52 10 b imbarco imbarco @BG upahall’ da−a. ai à art ha a.a. aa ora ha−ha a.c. abc qua anch’ @BG imbarco @BG 47 là 66 imbarco 46 il 65 imbarco @BG imbarco imbarco ma @BG imbarco 45 @BG 327 l’ un’ 64 imbarco imbarco 326 ad imbarco imbarco imbarco @BG ho la imbarco @BG 44 al imbarco 63 imbarco @BG a imbarco 62 @BG 324 @BG 61 @BG @BG 325 eps imbarco imbarco imbarco imbarco @BG </s> 9 @BG 298 @BG 8 eps ha di 59 imbarco 323 con ohincsiheopnattoco’lo hot per se on cc ck rock w of eh hop ed del p. s g huh su l’ non y f air @BG @BG 42 @BG di d’ 58 imbarco 41 imbarco u k uh n n’ @BG imbarco @BG @BG 322 c di 32 imbarco 40 @BG 297 con con conva di @BG imbarco @BG 296 @BG o di 31 d’ imbarco 295 @BG 320 @BG d’ @BG 37 38 @BG 319 e di @BG 36 è di 30 @BG 294 321 29 @BG d’ 35 @BG 292 293 @BG bar @BG 33 @BG bar d’ eps @BG 27 @BG 34 @BG 315 @BG bar di 28 @BG 290 316 con 25 288 imbarco imbarco imbarco 7 @BG 287 @BG di @BG eps− 24 imbarco @BG cancello @BG 286 imbarco @BG 6 cancello cancello cancello di 314 bar • Representation through a compact table @BG @BG @BG 318 cancello 21 22 23 @BG 317 cancello @BG 313 bar 20 cancello cancello 311 312 imbarco bar eps cancello l’ @BG 310 (Mangu 1999) 19 @BG @BG 283 @BG @BG 285 @BG di di @BG 284 308 di 18 l’ 282 @BG 306 @BG di 5 @BG 281 l’ 305 @BG 307 di di @BG 17 280 @BG 309 d’ l’ all’ eps 16 @BG 279 @BG l’ @BG @BG 278 l’ l’ l’ @BG a WG by shrinking into a unifilar WG @BG 276 la 304 15 l’ 301 @BG 303 la 4 14 l’ @BG 275 la @BG 302 @BG eps la 12 @BG 273 @BG la la 3 @BG 271 vacanza cancello vacanza eps era 9 era vacanza @BG la 2 era @BG 268 vacanza • A Confusion Network (CN) approximates Confusion Network Word Graph 255 </s> </s> 15 </s> </s> </s> @BG 208 </s> </s> </s> </s> 215 186 </s> 16 </s> </s> 328 era 0.997 cancello 0.995 0.999 di 0.615 imbarco 0.999 è 0.002 vacanza 0.004 la 0.001 d’ 0.376 bar 0.001 0.001 0.002 ... l’ 0.002 ... • Likelihood for each hypothesis Bertoldi SLT based on Confusion networks 0.001 OpenLab 2006 7 Phrase-based Translation Model Bertoldi SLT based on Confusion networks OpenLab 2006 8 Phrase-based Translation Model • Phrase: sequence of consecutive words Bertoldi SLT based on Confusion networks OpenLab 2006 8 Phrase-based Translation Model • Phrase: sequence of consecutive words • Alignment: map between CN and target phrases one word per column aligned with a target phrase e6 e5 . . . . . . . . . . . . . . . . . . . . . . . . e4 e2 # e3 e1 . NULL . . . f 11 f 21 f 31 f 41 f 51 f 61 f 71 f 12 f 22 f 32 f 42 f 62 f 72 f 23 f 43 f 63 f 24 Bertoldi SLT based on Confusion networks OpenLab 2006 f 64 8 Phrase-based Translation Model • Phrase: sequence of consecutive words • Alignment: map between CN and target phrases one word per column aligned with a target phrase e6 e5 ẽ∗ ≈ arg max max Pr(ẽ, a | G) ẽ a∈A(G,ẽ) . . . . . . . . . e1 . . NULL . . SLT based on Confusion networks . . . . . . . . . . . . . f 11 f 21 f 31 f 41 f 51 f 61 f 71 f 12 f 22 f 32 f 42 f 62 f 72 f 23 f 43 f 63 f 24 Bertoldi . e4 e2 # e3 • Search criterion: . OpenLab 2006 f 64 8 Phrase-based Translation Model • Phrase: sequence of consecutive words • Alignment: map between CN and target phrases one word per column aligned with a target phrase e6 e5 ẽ∗ ≈ arg max max Pr(ẽ, a | G) ẽ a∈A(G,ẽ) . . . . . . . . . e1 . . NULL . SLT based on Confusion networks . . . . . . . . . . . . . . f 11 f 21 f 31 f 41 f 51 f 61 f 71 f 12 f 22 f 32 f 42 f 62 f 72 f 23 f 43 f 63 f 24 • Pr(ẽ, a | G) is a log-linear phrase-based model Bertoldi . e4 e2 # e3 • Search criterion: . OpenLab 2006 f 64 8 Log-Linear Phrase-based Translation Model Bertoldi SLT based on Confusion networks OpenLab 2006 9 Log-Linear Phrase-based Translation Model The conditional distribution Pr(ẽ, a | G) is determined through suitable real valued feature functions hr (ẽ, a | G), r = 1 . . . R, and takes the parametric form: R X pλ (ẽ, a | G) ∝ exp Bertoldi SLT based on Confusion networks r=1 λr hr (ẽ, a | G) OpenLab 2006 (1) 9 Log-Linear Phrase-based Translation Model The conditional distribution Pr(ẽ, a | G) is determined through suitable real valued feature functions hr (ẽ, a | G), r = 1 . . . R, and takes the parametric form: R X Feature Functions: Bertoldi pλ (ẽ, a | G) ∝ exp SLT based on Confusion networks r=1 λr hr (ẽ, a | G) OpenLab 2006 (1) 9 Log-Linear Phrase-based Translation Model The conditional distribution Pr(ẽ, a | G) is determined through suitable real valued feature functions hr (ẽ, a | G), r = 1 . . . R, and takes the parametric form: R X Feature Functions: pλ (ẽ, a | G) ∝ exp r=1 (1) λr hr (ẽ, a | G) • Language model: 3-gram LM e6 e5 . . . . . . . . . . . . . . . . . . . . . . . . e4 e2 # e3 e1 . NULL . . . f 11 f 21 f 31 f 41 f 51 f 61 f 71 f 12 f 22 f 32 f 42 f 62 f 72 f 23 f 43 f 63 f 24 Bertoldi SLT based on Confusion networks OpenLab 2006 f 64 9 Log-Linear Phrase-based Translation Model The conditional distribution Pr(ẽ, a | G) is determined through suitable real valued feature functions hr (ẽ, a | G), r = 1 . . . R, and takes the parametric form: R X Feature Functions: pλ (ẽ, a | G) ∝ exp r=1 (1) λr hr (ẽ, a | G) • Language model: 3-gram LM e6 e5 • Fertility models: for target phrases and NULL word . . . . . . . . . . . . . . . . . . . . . . . . e4 e2 # e3 e1 . NULL . . . f 11 f 21 f 31 f 41 f 51 f 61 f 71 f 12 f 22 f 32 f 42 f 62 f 72 f 23 f 43 f 63 f 24 Bertoldi SLT based on Confusion networks OpenLab 2006 f 64 9 Log-Linear Phrase-based Translation Model The conditional distribution Pr(ẽ, a | G) is determined through suitable real valued feature functions hr (ẽ, a | G), r = 1 . . . R, and takes the parametric form: R X Feature Functions: pλ (ẽ, a | G) ∝ exp r=1 (1) λr hr (ẽ, a | G) • Language model: 3-gram LM e6 e5 • Fertility models: for target phrases and NULL word . . . . . . . . . . . . . . . . . . . . . . . . e4 • Distortion models: reordering of phrases and NULL word e2 # e3 e1 . NULL . . . f 11 f 21 f 31 f 41 f 51 f 61 f 71 f 12 f 22 f 32 f 42 f 62 f 72 f 23 f 43 f 63 f 24 Bertoldi SLT based on Confusion networks OpenLab 2006 f 64 9 Log-Linear Phrase-based Translation Model The conditional distribution Pr(ẽ, a | G) is determined through suitable real valued feature functions hr (ẽ, a | G), r = 1 . . . R, and takes the parametric form: R X Feature Functions: pλ (ẽ, a | G) ∝ exp r=1 (1) λr hr (ẽ, a | G) • Language model: 3-gram LM e6 e5 • Fertility models: for target phrases and NULL word . . . . . . . . . . . • Lexicon models: phrase-based e2 # e3 e1 . NULL . . . . . . . . . . . . . . f 11 f 21 f 31 f 41 f 51 f 61 f 71 f 12 f 22 f 32 f 42 f 62 f 72 f 23 f 43 f 63 f 24 SLT based on Confusion networks . e4 • Distortion models: reordering of phrases and NULL word Bertoldi . OpenLab 2006 f 64 9 Log-Linear Phrase-based Translation Model The conditional distribution Pr(ẽ, a | G) is determined through suitable real valued feature functions hr (ẽ, a | G), r = 1 . . . R, and takes the parametric form: R X Feature Functions: pλ (ẽ, a | G) ∝ exp r=1 • Language model: 3-gram LM e6 e5 • Fertility models: for target phrases and NULL word • Lexicon models: phrase-based . . . . . . . . . . . . . . e2 # e3 e1 . . . . . . . NULL . . . . . . f 11 f 21 f 31 f 41 f 51 f 61 f 71 f 12 f 22 f 32 f 42 f 62 f 72 f 23 f 43 f 63 • Likelihood of the path within G SLT based on Confusion networks . e4 • Distortion models: reordering of phrases and NULL word Bertoldi (1) λr hr (ẽ, a | G) f 24 OpenLab 2006 f 64 9 Log-Linear Phrase-based Translation Model The conditional distribution Pr(ẽ, a | G) is determined through suitable real valued feature functions hr (ẽ, a | G), r = 1 . . . R, and takes the parametric form: R X Feature Functions: pλ (ẽ, a | G) ∝ exp r=1 (1) λr hr (ẽ, a | G) • Language model: 3-gram LM e6 e5 • Fertility models: for target phrases and NULL word . . . . . . . . . . . . . . . . . . . . . . . . e4 • Distortion models: reordering of phrases and NULL word • Lexicon models: phrase-based e2 # e3 e1 . NULL . . . f 11 f 21 f 31 f 41 f 51 f 61 f 71 f 12 f 22 f 32 f 42 f 62 f 72 f 23 f 43 f 63 • Likelihood of the path within G f 24 f 64 • True length of the path disregarding -words Bertoldi SLT based on Confusion networks OpenLab 2006 9 Process for generating a translation hypothesis Translation Score: Bertoldi SLT based on Confusion networks f 41 f 51 f 61 f 22 f 42 ε f 62 ε f 43 f 11 f 21 f 12 f 31 OpenLab 2006 source CN f 63 10 Process for generating a translation hypothesis 1 choose how many columns 2 choose which consecutive columns 3 choose a word for each column ~ f1 Translation Score: Bertoldi SLT based on Confusion networks source phrases f 41 f 51 f 61 f 22 f 42 ε f 62 ε f 43 f 11 f 21 f 12 f 31 OpenLab 2006 source CN f 63 11 Process for generating a translation hypothesis 1 choose how many columns 2 choose which consecutive columns 3 choose a word for each column 4 choose a target phrase Translation Score: Bertoldi SLT based on Confusion networks ~ e1 target phrases ~ f1 source phrases f 41 f 51 f 61 f 22 f 42 ε f 62 ε f 43 f 11 f 21 f 12 f 31 OpenLab 2006 source CN f 63 12 Process for generating a translation hypothesis target words e1 1 choose how many columns 2 choose which consecutive columns 3 choose a word for each column 4 choose a target phrase 5 split target phrase into words Translation e1 Score: Bertoldi SLT based on Confusion networks ~ e1 target phrases ~ f1 source phrases f 41 f 51 f 61 f 22 f 42 ε f 62 ε f 43 f 11 f 21 f 12 f 31 OpenLab 2006 source CN f 63 13 Process for generating a translation hypothesis target words e1 1 choose how many columns 2 choose which consecutive columns 3 choose a word for each column 4 choose a target phrase 5 split target phrase into words 6 compute score Translation e1 Score: Bertoldi s1 SLT based on Confusion networks ~ e1 target phrases ~ f1 source phrases f 41 f 51 f 61 f 22 f 42 ε f 62 ε f 43 f 11 f 21 f 12 f 31 OpenLab 2006 source CN f 63 14 Process for generating a translation hypothesis e1 1 choose how many columns 2 choose which consecutive columns 3 choose a word for each column 4 choose a target phrase 5 split target phrase into words 6 Translation e1 e2 e 3 e4 Score: Bertoldi ~ e1 s1 + s2 SLT based on Confusion networks target phrases ~ e2 ~ f1 compute score target words e2 # e3 # e4 ~ f2 source phrases f 41 f 51 f 61 f 22 f 42 ε f 62 ε f 43 f 11 f 21 f 12 f 31 OpenLab 2006 source CN f 63 15 Process for generating a translation hypothesis e1 1 choose how many columns 2 choose which consecutive columns 3 choose a word for each column 4 choose a target phrase 5 split target phrase into words 6 Score: Bertoldi ~ e1 s1 + s2 + s3 SLT based on Confusion networks target words e5 ~ e2 ~ f1 compute score Translation e1 e2 e 3 e4 e5 e2 # e3 # e4 target phrases ~ e3 ~ f2 source phrases f 41 f 51 f 61 f 22 f 42 ε f 62 ε f 43 f 11 f 21 f 12 f 31 OpenLab 2006 source CN f 63 16 Process for generating a translation hypothesis e1 1 choose how many columns 2 choose which consecutive columns 3 choose a word for each column 4 choose a target phrase 5 split target phrase into words 6 Score: Bertoldi ~ e1 s1 + s2 + s3 + s4 SLT based on Confusion networks e5 ~ e2 ~ f1 compute score Translation e1 e2 e 3 e4 e5 e 6 e 7 e2 # e3 # e4 e6 # e7 target words ~ e4 target phrases ~ e3 ~ f2 ~ f3 source phrases f 41 f 51 f 61 f 22 f 42 ε f 62 ε f 43 f 11 f 21 f 12 f 31 OpenLab 2006 source CN f 63 17 Process for generating a translation hypothesis e1 1 choose how many columns 2 choose which consecutive columns 3 choose a word for each column 4 choose a target phrase 5 split target phrase into words ~ e1 NULL 5a not translate words 6 compute score Score: e5 ~ e2 ~ f1 e6 # e7 target words ~ e4 target phrases ~ e3 ~ f2 ~ f3 source phrases s1 + s2 + s3 + s4 + s0 Translation e1 e2 e 3 e4 e5 e 6 e 7 Bertoldi e2 # e3 # e4 SLT based on Confusion networks f 41 f 51 f 61 f 22 f 42 ε f 62 ε f 43 f 11 f 21 f 12 f 31 OpenLab 2006 source CN f 63 18 Decoder Bertoldi SLT based on Confusion networks OpenLab 2006 19 Decoder • Generative translation process • Synchronous on output phrases Bertoldi SLT based on Confusion networks OpenLab 2006 19 Decoder • Generative translation process • Synchronous on output phrases • Dynamic programming • Beam search: deletion of less promising partial translations Bertoldi SLT based on Confusion networks OpenLab 2006 19 Decoder • Generative translation process • Synchronous on output phrases • Dynamic programming • Beam search: deletion of less promising partial translations • Reordering constraints: reduction of possible alignments • Lexicon pruning: no more than 30 translation per phrase • Confusion network pruning: removal of less confident words Bertoldi SLT based on Confusion networks OpenLab 2006 19 Decoder • Generative translation process • Synchronous on output phrases • Dynamic programming • Beam search: deletion of less promising partial translations • Reordering constraints: reduction of possible alignments • Lexicon pruning: no more than 30 translation per phrase • Confusion network pruning: removal of less confident words • Word graph generation: representation of the whole search space • N -best extraction: multiple translations Bertoldi SLT based on Confusion networks OpenLab 2006 19 N -best-based SLT system Bertoldi SLT based on Confusion networks OpenLab 2006 20 N -best-based SLT system • relies on a text-based decoder: simplified verison of the CN-based decoder Bertoldi SLT based on Confusion networks OpenLab 2006 20 N -best-based SLT system • relies on a text-based decoder: simplified verison of the CN-based decoder • translates separately all N -best transcriptions Bertoldi SLT based on Confusion networks OpenLab 2006 20 N -best-based SLT system • relies on a text-based decoder: simplified verison of the CN-based decoder • translates separately all N -best transcriptions • adds acoustic and source LM scores provided with the N -best transcriptions Bertoldi SLT based on Confusion networks OpenLab 2006 20 N -best-based SLT system • relies on a text-based decoder: simplified verison of the CN-based decoder • translates separately all N -best transcriptions • adds acoustic and source LM scores provided with the N -best transcriptions • reranks the outputs Bertoldi SLT based on Confusion networks OpenLab 2006 20 Evaluation • Shared Task T3: integration of ASR and MT • Input: human, automatic, N -best, Confusion Networks • Automatic evaluation: BLEU score, case insensitive Sentences Running words Vocabulary best transcription WER Bertoldi SLT based on Confusion networks Train Dev Test 1,2M 2,643 1,073 31M 30M 20K 23K 18.9K 19.3K 140K 94K 2.9K 2.6K 3.3K 2.8K — 11.77% OpenLab 2006 14.90% 21 Results DEV BLEU input human Bertoldi TEST size WER 1 0 45.78 time input size WER 1 0 0.6 SLT based on Confusion networks BLEU time 40.84 1.7 OpenLab 2006 22 Results DEV BLEU input size WER human 1 0 45.78 1-bst 1 11.77 40.17 Bertoldi • 10% decrement due to ASR TEST time input BLEU time size WER 0.6 1 0 40.84 1.7 0.6 1 14.60 36.64 2.1 SLT based on Confusion networks • comparable to ASR WER OpenLab 2006 22 Results DEV BLEU input • 10% decrement due to ASR TEST size WER human 1 0 45.78 1-bst 1 11.77 5-bst 4 10-bst time input BLEU time size WER 0.6 1 0 40.84 1.7 40.17 0.6 1 14.60 36.64 2.1 8.12 40.63 2.8 5 11.90 36.47 10.5 8 6.99 40.83 5.3 9 11.02 36.75 20.4 20-bst 13 6.19 41.03 9.8 16 10.20 36.55 38.9 50-bst 25 5.40 40.85 20.6 34 9.47 36.66 84.2 100-bst 38 5.07 40.87 33.2 56 9.09 36.68 135.3 Bertoldi SLT based on Confusion networks • comparable to ASR WER • few transcriptions • difficult to improve OpenLab 2006 22 Results DEV BLEU input • 10% decrement due to ASR TEST size WER human 1 0 45.78 1-bst 1 11.77 5-bst 4 10-bst time input BLEU time size WER 0.6 1 0 40.84 1.7 40.17 0.6 1 14.60 36.64 2.1 8.12 40.63 2.8 5 11.90 36.47 10.5 8 6.99 40.83 5.3 9 11.02 36.75 20.4 20-bst 13 6.19 41.03 9.8 16 10.20 36.55 38.9 50-bst 25 5.40 40.85 20.6 34 9.47 36.66 84.2 100-bst 38 5.07 40.87 33.2 56 9.09 36.68 135.3 cn-p00 1 11.67 40.30 4.0 1 14.46 36.54 28.4 cn-p50 4 9.42 41.06 5.8 32 11.86 37.14 31.2 cn-p55 13 8.93 41.21 6.3 150 11.32 37.23 34.7 cn-p60 194 8.41 41.24 6.7 1,284 10.71 37.21 37.9 cn-p65 1,359 7.91 41.21 7.4 9,816 10.16 37.05 43.9 cn-p70 15,056 7.53 41.23 27.4 228,461 9.71 37.14 54.6 Bertoldi SLT based on Confusion networks • comparable to ASR WER • few transcriptions • difficult to improve • CN slightly better than N -bst • CN contains more hypotheses • higher ASR WER • CN is more efficient OpenLab 2006 22 Future plan Bertoldi SLT based on Confusion networks OpenLab 2006 23 Future plan • generation of richer CNs – with lower WER – with limited size Bertoldi SLT based on Confusion networks OpenLab 2006 23 Future plan • generation of richer CNs – with lower WER – with limited size • introduction of other features related to input: – source LM: reliability of a path Bertoldi SLT based on Confusion networks OpenLab 2006 23 Future plan • generation of richer CNs – with lower WER – with limited size • introduction of other features related to input: – source LM: reliability of a path • experiment on a more difficult task (higher ASR WER) Bertoldi SLT based on Confusion networks OpenLab 2006 23 Future plan • generation of richer CNs – with lower WER – with limited size • introduction of other features related to input: – source LM: reliability of a path • experiment on a more difficult task (higher ASR WER) • decoding the whole ASR WG Bertoldi SLT based on Confusion networks OpenLab 2006 23 Thanks for your attention! Bertoldi SLT based on Confusion networks OpenLab 2006 24 References [1] Ney, “Speech Translation: Coupling of Recognition and Translation”. ICASSP 1999. [2] Bangalore and Riccardi, “Stochastic finite-state models for spoken language machine translation”. Machine Translation, 17(3), 2002. [3] Zhang et al., “A Unified Approach in Speech-to-Speech Translation: Integrating Features of Speech Recognition and Machine Translation”. COLING 2004. [4] Casacuberta et al., “Some approaches to statistical and finite-state speech-to-speech translation”. Computer Speech and Language, 18, 2004. [5] Mangu et al., “Finding consensus among words: Lattice-based word error minimization”. ISCA ECSCT 1999. [6] Quan et al., “Integrated n-best re-ranking for spoken language translation”. Interspeech 2005. [7] Cettolo et al., “A look inside the ITC-irst SMT system”. MT Summit X 2005. [8] Federico and Bertoldi, “A word-to-phrase statistical translation model”. Transactions on Speech and Language Processing. 2(2). 2005. [9] Bertoldi and Federico, “A New Decoder for Spoken Language Translation based on Confusion Networks”. ASRU 2005. Bertoldi SLT based on Confusion networks OpenLab 2006 25 ITC-irst SLT system Architecture : extractor (from WG) 1−best text MT ASR WG N−best best solution WG N−best conf. network speech signal Bertoldi speech decoding speech hypotheses SLT based on Confusion networks Rescoring confusion network MT translation decoding translation hypotheses OpenLab 2006 26 ITC-irst SLT system Architecture • different input types: text, N -best, Confusion Networks : extractor (from WG) 1−best text MT ASR WG N−best best solution WG N−best conf. network speech signal Bertoldi speech decoding speech hypotheses SLT based on Confusion networks Rescoring confusion network MT translation decoding translation hypotheses OpenLab 2006 26 ITC-irst SLT system Architecture • different input types: text, N -best, Confusion Networks • two-step decoder : extractor (from WG) 1−best text MT ASR WG N−best best solution WG N−best conf. network speech signal Bertoldi speech decoding speech hypotheses SLT based on Confusion networks Rescoring confusion network MT translation decoding translation hypotheses OpenLab 2006 26 ITC-irst SLT system Architecture • different input types: text, N -best, Confusion Networks • two-step decoder • rescoring with additional features : extractor (from WG) 1−best text MT ASR WG N−best best solution WG N−best conf. network speech signal Bertoldi speech decoding speech hypotheses SLT based on Confusion networks Rescoring confusion network MT translation decoding translation hypotheses OpenLab 2006 26 ITC-irst SLT system Architecture • different input types: text, N -best, Confusion Networks • two-step decoder • rescoring with additional features • reranking with optimized weights : extractor (from WG) 1−best text MT ASR WG N−best best solution WG N−best conf. network speech signal Bertoldi speech decoding speech hypotheses SLT based on Confusion networks Rescoring confusion network MT translation decoding translation hypotheses OpenLab 2006 26