Welcome to Joey S2T’s documentation!#

JoeyS2T is a JoeyNMT extension for Speech-to-Text tasks such as Automatic Speech Recognition (ASR) and end-to-end Speech Translation (ST). It inherits the core philosophy of JoeyNMT, a minimalist NMT toolkit built on PyTorch, seeking simplicity and accessibility.

JoeyS2T implements the following features:

Transformer Encoder-Decoder
1d-Conv Subsampling
Cross-entropy and CTC joint objective
Mel filterbank spectrogram extraction
CMVN, SpecAugment
WER evaluation

Furthermore, all the functionalities in JoeyNMT v2 are also available from JoeyS2T:

BLEU and ChrF evaluation
BPE tokenization (with BPE dropout option)
Beam search and greedy decoding (with repetition penalty, ngram blocker)
Customizable initialization
Attention visualization
Learning curve plotting
Scoring hypotheses and references
Multilingual translation with language tags

If you use JoeyS2T in a publication or thesis, please cite the following paper:

@inproceedings{ohta-etal-2022-joeys2t,
    title = "{JoeyS2T}: Minimalistic Speech-to-Text Modeling with {JoeyNMT}",
    author = "Ohta, Mayumi  and
      Kreutzer, Julia  and
      Riezler, Stefan",
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, UAE",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.emnlp-demos.6",
    pages = "50--59",
}