Ungoliant: An Optimized Pipeline for the Generation of a Very Large-Scale Multilingual Web Corpus

Julien Abadji
Julien Abadji
Research Engineer

I’m a research engineer at ALMAnaCH research team at Inria

Pedro Ortiz Suarez
Pedro Ortiz Suarez
Researcher

I’m a researcher at the Speech and Language Technology Team at DFKI GmbH Berlin.

Laurent Romary
Laurent Romary
Senior Researcher

Inria Senior Researcher, DARIAH EU infrastructure, director, ISO/TC 37 chair

Benoît Sagot
Benoît Sagot
Senior Researcher

Inria Senior Researcher in Natural Language Processing and Computational Linguistics