Benchmarks#

We provide several pretrained models with their benchmark results.

JoeyS2T#

  • For ASR task, we compute WER (lower is better)

  • For MT and ST task, we compute BLEU (higher is better)

LibriSpeech 100h#

System

Architecture

dev-clean

dev-other

test-clean

test-other

#params

download

Kahn etal

BiLSTM

14.00

37.02

14.85

39.95

-

Laptev etal

Transformer

10.3

24.0

11.2

24.9

-

ESPnet

Transformer

8.1

20.2

8.4

20.5

-

ESPnet

Conformer

6.3

17.4

6.5

17.3

-

JoeyS2T

Transformer

10.18

23.39

11.58

24.31

93M

librispeech100h.tar.gz (948M)

LibriSpeech 960h#

System

Architecture

dev-clean

dev-other

test-clean

test-other

#params

download

Gulati etal

BiLSTM

1.9

4.4

2.1

4.9

-

-

ESPnet

Conformer

2.3

6.1

2.6

6.0

-

-

SpeechBrain

Conformer

2.13

5.51

2.31

5.61

165M

-

fairseq S2T

Transformer

3.23

8.01

3.52

7.83

71M

-

fairseq wav2vec2

Conformer

3.17

8.87

3.39

8.57

94M

-

JoeyS2T

Transformer

10.18

23.39

11.58

24.31

102M

librispeech960h.tar.gz (1.1G)

MuST-C ASR pretraining#

System

train

eval

dev

tst-COMMON

tst-HE

#params

download

Gangi etal

v1

v1

-

27.0

-

-

ESPnet

v1

v1

-

12.70

-

-

fairseq S2T

v1

v1

13.07

12.72

10.93

29.5M

fairseq S2T

v1

v2

9.11

11.88

10.43

29.5M

JoeyS2T

v2

v1

18.09

18.66

14.97

96M

JoeyS2T

v2

v2

9.77

12.51

10.73

96M

mustc_asr.tar.gz (940M)

MuST-C MT pretraining#

System

train

eval

dev

tst-COMMON

tst-HE

#params

download

Gangi etal

v1

v1

-

25.3

-

-

Zhang etal

v1

v1

-

29.69

-

-

ESPnet

v1

v1

-

27.63

-

-

JoeyS2T

v2

v1

21.85

23.15

20.37

66.5M

JoeyS2T

v2

v2

26.99

27.61

25.26

66.5M

mustc_mt.tar.gz (729M)

MuST-C end-to-end ST#

System

train

eval

dev

tst-COMMON

tst-HE

#params

download

Gangi etal

v1

v1

-

17.3

-

-

Zhang etal

v1

v1

-

20.67

-

-

ESPnet

v1

v1

-

22.91

-

-

fairseq S2T

v1

v2

22.05

22.70

21.70

31M

JoeyS2T

v2

v1

21.06

20.92

21.78

96M

JoeyS2T

v2

v2

24.26

23.86

23.86

96M

mustc_st.tar.gz (952M)

sacrebleu signature: nrefs:1|case:mixed|eff:no|tok:13a|smooth:exp|version:2.1.0

Note

For MuST-C, we trained our model on the English-German subset of version 2, and evaluated the model both on version 1 and version 2 tst-COMMON, and tst-HE splits. See benchmarks.ipynb to replicate these results.

JoeyNMT v2.x#

IWSLT14 de/en/fr multilingual#

We trained this multilingual model with JoeyNMT v2.3.0 using DDP.

Direction

Architecture

Tokenizer

dev

test

#params

download

en->de

Transformer

sentencepiece

-

28.88

200M

iwslt14_prompt

de->en

-

35.28

en->fr

-

38.86

fr->en

-

40.35

sacrebleu signature: nrefs:1|case:lc|eff:no|tok:13a|smooth:exp|version:2.4.0

WMT14 ende / deen#

We trained the models with JoeyNMT v2.1.0 from scratch.

cf) wmt14 deen leaderboard in paperswithcode

Direction

Architecture

Tokenizer

dev

test

#params

download

en->de

Transformer

sentencepiece

24.36

24.38

60.5M

wmt14_ende.tar.gz (766M)

de->en

Transformer

sentencepiece

30.60

30.51

60.5M

wmt14_deen.tar.gz (766M)

sacrebleu signature: nrefs:1|case:mixed|eff:no|tok:13a|smooth:exp|version:2.2.0

JoeyNMT v1.x#

Warning

The following models are trained with JoeynNMT v1.x, and decoded with Joey NMT v2.0. See config_v1.yaml and config_v2.yaml in the linked tar.gz, respectively. Joey NMT v1.x benchmarks are archived here.

IWSLT14 deen#

Pre-processing with Moses decoder tools as in this script.

Direction

Architecture

Tokenizer

dev

test

#params

download

de->en

RNN

subword-nmt

31.77

30.74

61M

rnn_iwslt14_deen_bpe.tar.gz (672M)

de->en

Transformer

subword-nmt

34.53

33.73

19M

transformer_iwslt14_deen_bpe.tar.gz (221M)

sacrebleu signature: nrefs:1|case:lc|eff:no|tok:13a|smooth:exp|version:2.0.0

Note

For interactive translate mode, you should specify pretokenizer: "moses" in both src’s and trg’s tokenizer_cfg, so that you can input raw sentences. Then MosesTokenizer and MosesDetokenizer will be applied internally. For test mode, we used the preprocessed texts as input and set pretokenizer: "none" in the config.

Masakhane JW300 afen / enaf#

We picked the pretrained models and configs (bpe codes file etc.) from masakhane.io.

Direction

Architecture

Tokenizer

dev

test

#params

download

af->en

Transformer

subword-nmt

-

57.70

46M

transformer_jw300_afen.tar.gz (525M)

en->af

Transformer

subword-nmt

47.24

47.31

24M

transformer_jw300_enaf.tar.gz (285M)

sacrebleu signature: nrefs:1|case:mixed|eff:no|tok:intl|smooth:exp|version:2.0.0

JParaCrawl enja / jaen#

For training, we split JparaCrawl v2 into train and dev set and trained a model on them. Please check the preprocessing script here. We tested then on kftt test set and wmt20 test set, respectively.

Direction

Architecture

Tokenizer

kftt

wmt20

#params

download

af->en

Transformer

sentencepiece

17.66

14.31

225M

jparacrawl_enja.tar.gz (2.3GB)

en->af

Transformer

sentencepiece

14.97

11.49

221M

jparacrawl_jaen.tar.gz (2.2GB)

sacrebleu signature:
  • en->ja: nrefs:1|case:mixed|eff:no|tok:ja-mecab-0.996-IPA|smooth:exp|version:2.0.0

  • ja->en: nrefs:1|case:mixed|eff:no|tok:intl|smooth:exp|version:2.0.0

(Note: In wmt20 test set, newstest2020-enja has 1000 examples, newstest2020-jaen has 993 examples.)