top of page
Digipal Media Mgr

Multilingual Chatbots supporting 50+ Languages, with Future Growth Hinged on Synthetic Data Innovations



ChatGPT is a versatile multilingual chatbot and currently supports over 50 languages! 🌍💬 This includes Chinese, Japanese, Spanish, French, German, Russian, Arabic, Portuguese, Italian, and more. Large language models (LLMs) excel particularly in languages with extensive training data, encompassing diverse linguistic structures and idioms.


Substantial amounts of well-structured training data, such as example translations, are key for achieving high-quality results. The heat map derived from OPUS parallel corpora indicates the translation quality that can be expected across different languages. Obviously, there are quite some gaps.


Based on observations, data requirements increased about tenfold for every new model generation. What needs to happen for the models’ capabilities to evolve further?


On the assumption that commercial model makers won’t train on private data, then future models must rely heavily on synthetic data or some other new ideas will be required.

23 views0 comments

Comments


bottom of page