Benchmarks#
We provide several pretrained models with their benchmark results.
JoeyS2T#
For ASR task, we compute WER (lower is better)
For MT and ST task, we compute BLEU (higher is better)
LibriSpeech 100h#
System |
Architecture |
dev-clean |
dev-other |
test-clean |
test-other |
#params |
download |
---|---|---|---|---|---|---|---|
BiLSTM |
14.00 |
37.02 |
14.85 |
39.95 |
- |
||
Transformer |
10.3 |
24.0 |
11.2 |
24.9 |
- |
||
Transformer |
8.1 |
20.2 |
8.4 |
20.5 |
- |
||
Conformer |
6.3 |
17.4 |
6.5 |
17.3 |
- |
||
JoeyS2T |
Transformer |
10.18 |
23.39 |
11.58 |
24.31 |
93M |
librispeech100h.tar.gz (948M) |
LibriSpeech 960h#
System |
Architecture |
dev-clean |
dev-other |
test-clean |
test-other |
#params |
download |
---|---|---|---|---|---|---|---|
BiLSTM |
1.9 |
4.4 |
2.1 |
4.9 |
- |
- |
|
Conformer |
2.3 |
6.1 |
2.6 |
6.0 |
- |
- |
|
Conformer |
2.13 |
5.51 |
2.31 |
5.61 |
165M |
- |
|
Transformer |
3.23 |
8.01 |
3.52 |
7.83 |
71M |
- |
|
Conformer |
3.17 |
8.87 |
3.39 |
8.57 |
94M |
- |
|
JoeyS2T |
Transformer |
10.18 |
23.39 |
11.58 |
24.31 |
102M |
librispeech960h.tar.gz (1.1G) |
MuST-C ASR pretraining#
System |
train |
eval |
dev |
tst-COMMON |
tst-HE |
#params |
download |
|
---|---|---|---|---|---|---|---|---|
v1 |
v1 |
- |
27.0 |
- |
- |
|||
v1 |
v1 |
- |
12.70 |
- |
- |
|||
v1 |
v1 |
13.07 |
12.72 |
10.93 |
29.5M |
|||
v1 |
v2 |
9.11 |
11.88 |
10.43 |
29.5M |
|||
JoeyS2T |
v2 |
v1 |
18.09 |
18.66 |
14.97 |
96M |
||
JoeyS2T |
v2 |
v2 |
9.77 |
12.51 |
10.73 |
96M |
mustc_asr.tar.gz (940M) |
MuST-C MT pretraining#
System |
train |
eval |
dev |
tst-COMMON |
tst-HE |
#params |
download |
|
---|---|---|---|---|---|---|---|---|
v1 |
v1 |
- |
25.3 |
- |
- |
|||
v1 |
v1 |
- |
29.69 |
- |
- |
|||
v1 |
v1 |
- |
27.63 |
- |
- |
|||
JoeyS2T |
v2 |
v1 |
21.85 |
23.15 |
20.37 |
66.5M |
||
JoeyS2T |
v2 |
v2 |
26.99 |
27.61 |
25.26 |
66.5M |
mustc_mt.tar.gz (729M) |
MuST-C end-to-end ST#
System |
train |
eval |
dev |
tst-COMMON |
tst-HE |
#params |
download |
|
---|---|---|---|---|---|---|---|---|
v1 |
v1 |
- |
17.3 |
- |
- |
|||
v1 |
v1 |
- |
20.67 |
- |
- |
|||
v1 |
v1 |
- |
22.91 |
- |
- |
|||
v1 |
v2 |
22.05 |
22.70 |
21.70 |
31M |
|||
JoeyS2T |
v2 |
v1 |
21.06 |
20.92 |
21.78 |
96M |
||
JoeyS2T |
v2 |
v2 |
24.26 |
23.86 |
23.86 |
96M |
mustc_st.tar.gz (952M) |
sacrebleu signature: nrefs:1|case:mixed|eff:no|tok:13a|smooth:exp|version:2.1.0
Note
For MuST-C, we trained our model on the English-German subset of version 2, and evaluated the model both on version 1 and version 2 tst-COMMON
, and tst-HE splits
. See benchmarks.ipynb to replicate these results.
JoeyNMT v2.x#
IWSLT14 de/en/fr multilingual#
We trained this multilingual model with JoeyNMT v2.3.0 using DDP.
Direction |
Architecture |
Tokenizer |
dev |
test |
#params |
download |
---|---|---|---|---|---|---|
en->de |
Transformer |
sentencepiece |
- |
28.88 |
200M |
|
de->en |
- |
35.28 |
||||
en->fr |
- |
38.86 |
||||
fr->en |
- |
40.35 |
sacrebleu signature: nrefs:1|case:lc|eff:no|tok:13a|smooth:exp|version:2.4.0
WMT14 ende / deen#
We trained the models with JoeyNMT v2.1.0 from scratch.
cf) wmt14 deen leaderboard in paperswithcode
Direction |
Architecture |
Tokenizer |
dev |
test |
#params |
download |
---|---|---|---|---|---|---|
en->de |
Transformer |
sentencepiece |
24.36 |
24.38 |
60.5M |
wmt14_ende.tar.gz (766M) |
de->en |
Transformer |
sentencepiece |
30.60 |
30.51 |
60.5M |
wmt14_deen.tar.gz (766M) |
sacrebleu signature: nrefs:1|case:mixed|eff:no|tok:13a|smooth:exp|version:2.2.0
JoeyNMT v1.x#
Warning
The following models are trained with JoeynNMT v1.x, and decoded with Joey NMT v2.0.
See config_v1.yaml
and config_v2.yaml
in the linked tar.gz, respectively.
Joey NMT v1.x benchmarks are archived here.
IWSLT14 deen#
Pre-processing with Moses decoder tools as in this script.
Direction |
Architecture |
Tokenizer |
dev |
test |
#params |
download |
---|---|---|---|---|---|---|
de->en |
RNN |
subword-nmt |
31.77 |
30.74 |
61M |
rnn_iwslt14_deen_bpe.tar.gz (672M) |
de->en |
Transformer |
subword-nmt |
34.53 |
33.73 |
19M |
sacrebleu signature: nrefs:1|case:lc|eff:no|tok:13a|smooth:exp|version:2.0.0
Note
For interactive translate mode, you should specify pretokenizer: "moses"
in both src’s and trg’s tokenizer_cfg
,
so that you can input raw sentences. Then MosesTokenizer
and MosesDetokenizer
will be applied internally.
For test mode, we used the preprocessed texts as input and set pretokenizer: "none"
in the config.
Masakhane JW300 afen / enaf#
We picked the pretrained models and configs (bpe codes file etc.) from masakhane.io.
Direction |
Architecture |
Tokenizer |
dev |
test |
#params |
download |
---|---|---|---|---|---|---|
af->en |
Transformer |
subword-nmt |
- |
57.70 |
46M |
|
en->af |
Transformer |
subword-nmt |
47.24 |
47.31 |
24M |
sacrebleu signature: nrefs:1|case:mixed|eff:no|tok:intl|smooth:exp|version:2.0.0
JParaCrawl enja / jaen#
For training, we split JparaCrawl v2 into train and dev set and trained a model on them. Please check the preprocessing script here. We tested then on kftt test set and wmt20 test set, respectively.
Direction |
Architecture |
Tokenizer |
kftt |
wmt20 |
#params |
download |
---|---|---|---|---|---|---|
af->en |
Transformer |
sentencepiece |
17.66 |
14.31 |
225M |
jparacrawl_enja.tar.gz (2.3GB) |
en->af |
Transformer |
sentencepiece |
14.97 |
11.49 |
221M |
jparacrawl_jaen.tar.gz (2.2GB) |
- sacrebleu signature:
en->ja: nrefs:1|case:mixed|eff:no|tok:ja-mecab-0.996-IPA|smooth:exp|version:2.0.0
ja->en: nrefs:1|case:mixed|eff:no|tok:intl|smooth:exp|version:2.0.0
(Note: In wmt20 test set, newstest2020-enja has 1000 examples, newstest2020-jaen has 993 examples.)