This paper was accepted on the EMNLP Workshop on Computational Approaches to Linguistic Code-Switching (CALCS).
Code-switching (CS), i.e. mixing completely different languages in a single sentence, is a typical phenomenon in communication and will be difficult in lots of Pure Language Processing (NLP) settings. Earlier research on CS speech have proven promising outcomes for end-to-end speech translation (ST), however have been restricted to offline eventualities and to translation to one of many languages current within the supply (monolingual transcription).
On this paper, we concentrate on two important but unexplored areas for real-world CS speech translation: streaming settings, and translation to a 3rd language (i.e., a language not included within the supply). To this finish, we lengthen the Fisher and Miami take a look at and validation datasets to incorporate new targets in Spanish and German. Utilizing this knowledge, we practice a mannequin for each offline and streaming ST and we set up baseline outcomes for the 2 settings talked about earlier.