Parallel Corpora
About this Dataset
The translation dataset is a collection parallel corpora of texts translated from English to other languages. There are 4 billion units in 16 domains. There are different quality levels that impact the pricing. Language pairs available (translation from English): Albanian, Arabic, Armenian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, Estonian, Finnish, French, Georgian, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Kyrgyz, Latvian, Lithuanian, Malay, Maltese, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Thai, Turkish, Ukrainian, Vietnamese.
This dataset is covered by our standard Data license agreement. The license agreement is perpetual and allows for the commercialization of all models built on the data.