Meta vs. OpenAI: Large Open-source Models for Translation

Meta’s open-source Seamless : A deep dive into translation model architectures and a Python using HuggingFace

LuĂ­s Roque
Towards Data Science

This post was co-authored with Rafael Guedes.

The growth of an organization is not limited to its country boundaries. Some organizations only sell or operate on external markets. This globalization comes with several challenges, one being how to handle different languages and make the changes from product labeling to promotional materials less expensive. The recent developments in AI come in handy because they allow a cheap and quick translation not only of text but also of audio material.

Organizations that incorporate AI in their day-to-day activities are always one step ahead of the competition, especially when getting all the components around your product for the new market. The timing is as important as the quality of your product or ; thereby, being able to be the first one to arrive is crucial, and technologies like speech-to-speech and text-to-text translation will help you reduce the you need to enter a new market.

In this article, we explore Seamless, a family of three models developed by Meta to unlock cross-multilingual . We provide a detailed explanation of the architecture of each model and how they work. Finally, we finish with a practical implementation in Python using HuggingFace 🤗, and we expose and show how to overcome some of their limitations.

Figure 1: Seamless, a family of models that can understand more than 100 languages ( by author with DALL-E)

As always, the is available on our .

Seamless [1] is the first system that tries to remove language barriers and unlock expressive cross-lingual communication in real time. It is composed of multiple models from the Seamless Family, such as SeamlessM4T v2 [1], SeamlessExpressive [1], and SeamlessStreaming [1] that allow speech-to-speech and text-to-text translation over 101 input and 36 output languages. Each model will be explained in more detail in…

Source link