Ungoliant: An Optimized Pipeline for the Generation of a Very Large-Scale Multilingual Web CorpusJulien Abadji, Pedro Ortiz Suarez, Laurent Romary, Benoît SagotLast updated on Sep 3, 2021PDF Code WorkshopJulien AbadjiResearch EngineerI’m a research engineer at ALMAnaCH research team at InriaPedro Ortiz SuarezResearcherI’m a researcher at the Speech and Language Technology Team at DFKI GmbH Berlin.Laurent RomarySenior ResearcherInria Senior Researcher, DARIAH EU infrastructure, director, ISO/TC 37 chairBenoît SagotSenior ResearcherInria Senior Researcher in Natural Language Processing and Computational Linguistics