Équipe Bibliome

"Acquisition et formalisation de connaissances

à partir de textes"

 

Responsable : Claire Nédellec

The Bibliome team develops Natural Language Processing (NLP) and Machine Learning (ML) methods to extract information from text in the biology domain.

We work on specific information extraction (IE) tasks such as entity recognition, entity normalization (linking) and relation extraction. We focus on methods that combine linguistic information, ML and domain knowledge (ontologies and taxonomies) and are able to handle a small number of training examples.

We apply our methods to a wide range of biological applications--from microbial diversity to plant biology and epidemiological surveillance.

An important part of our activity is also to promote the development and evaluation of IE systems by organizing shared tasks.


Projets

Projets en cours

CoSO project, Visa TM (Towards an advanced infrastructure in text-mining) (2017-2019)

Infrastructure H2020 Text-mining OpenMinTeD (2015-2018)

D-ONT, Exploitation optimisée des bases de données phénotypiques - Des ontologies pour le partage d’information, ACI Phase 2016-2018

Projets récents

IMSV, Institut de modélisation des systèmes vivants, Lidex de l'Université Paris-Saclay (2014-2016)

SeeDev, Regulations in the development of Arabidopsis thaliana seed (Challenge Lidex CDS) (2015)

OntoBiotopeMetaprogramme INRA MEM (Metagenomics of microbial ecosystems). (2012-2013).

Triphase: Semantic information system for publications in animal physiology and agricultural systems. PHASE department (2013-2014).

QuaeroAutomatic multimedia content processing. Oséo. (2008-2013).

FSOV SAM BléSelection of wheat by genetic markers. Fond de soutien à l'obtention végétale (2010-2013).


Animation

Workgroup Labex DigiCosme D2K (from Data to Knowledge)

INRA CATI ICAT (Knowledge Engineering and Text Analysis)

BioNLP-Shared Task (201120132016): annotated corpora and on-line evaluation services

LLL, Learning Language in Logics (2005)


Membres

Claire Nédellec, Directrice de Recherche, responsable de l'équipe Bibliome
Robert Bossy

Robert Bossy, Ingénieur de Recherche, responsable de la Suite Alvis

Louise Deleger Louise Deléger, Chercheuse
Arnaud Ferré, Postdoc
Reda Mekdad Reda Mekdad, Ingénieur d'études

Anciens membres

Mouhamadou Mouhamadou Ba, Postdoc, projet OpenMinTeD
Estelle Chaix, Postdoc, projet OpenMinTeD
Philippe Bessière Philippe Bessières, Directeur de recherche
Dialekti Valsamou Dialekti Valsamou, Doctorante, IDEX IDI

Logiciels

Visitez-nous sur GitHub.

  • Alvis NLP/ML est une chaîne de traitement pour l'annotation sémantique de documents textuels, intégrant des outils de traitement automatique des langues naturelles pour la segmentation en mots/phrases, la reconnaissance d'entités nommées, l'analyse de termes, le typage sémantique et l'extraction de relations. Ces outils exploitent des ressources externes, comme des terminologies ou des ontologies. AlvisNLP/ML propose plusieurs outils pour l'acquisition (semi)-automatique de ces ressources, fondées sur des techniques d'apprentissage automatique. La chaîne est facilement configurable et extensible par ajout de nouveaux composants. Ce travail a été partiellement financé par le projet européen Alvis et le projet Quaero. Voir Nédellec et al., Handbook on Ontology, 2009.
  • AlvisAE (Alvis Annotation Editor) est un éditeur d'annotation en ligne. Il permet de visualiser et d'annoter les entités et les relations d'un texte. Il inclut des fonctions de gestion de campagne d'annotation. Il permet d'annoter les entités par les concepts d'une ontologie et de réviser l'ontologie en parallèle. Il est intégré à AlvisNLP. Ce travail a été partiellement financé par le projet Quaero. Voir LAW VI paper pour plus de détails.
  • AlvisIR (Alvis Information Retrieval) is an on-line generic semantic search engine ; only few hours are needed to create a a new instance for a given document collection and an ontology. A user query with the ontology concepts retrieves all documents that contain the concepts, in the form of specific concepts, or synonyms. AlvisIR semantic search engine also handles relational queries. See for example search on biotopes of microorganisms . Part of this work has been funded by the European project Alvis and the French project Quaero.
  • BioYaTeA is an extension of the YaTeA term extractor that deals with prepositional attachments and adjectival participle. It extracts terms from documents in French and in English. Its distribution includes post-filtering of irrelevant terms. It is publicly available as CPAN module. Part of this work has been funded by the European project Alvis and the French project Quaero. See (Golik et al., CiCLING'2013) for more details.
  • TyDI (Terminology Design Interface) is a collaborative tool for the manual validation and structuring of terms either originating from terminologies or extracted from training corpus of textual documents. It is used on the output of so-called term extractor programs (like BioYatea), which are used to identify candidates terms (e.g. compound nouns). With TyDI, a user can validate candidate terms and specify synonymy/hyperonymy relations. These annotations can then be exported in several formats, and used in other natural language processing tools. Part of this work has been funded by the French project Quaero. More details (Golik et al., Ekaw 2010 ).

Online Services

Semantic search engines based on the AlvisIR technology

  • Biotope relational search engine indexes all PubMed references on habitats of microorganisms and phenotypes (2,3 millions references) with Alvis Suite technology and OntoBiotope Ontology. Funded by OpenMinTeD, Quaero project and MEM metaprogramme.
  • SamBlé indexes a large set of references on genetic markers and phentoypes in bread wheat with Alvis Suite technology and Wheat Trait Ontology. FSOV SamBlé Project and OpenMinTeD
  • SeeDev indexes a large set of references on molecular mechanism involved in seed development using Alvis Suite technology. Supported by UPSay CDS&IMSV projects and OpenMinTeD.
  • TriPhas’IR indexes the publications of the PHASE scientific department (2010-2014) with the TriPhase termino-ontology.
  • AnimalIR indexes Animal Journal articles with the ATOL ontology

Other online services

  • Florilege is an on-line database that integrates information from various sources about positive flora in food and beyond, including textual data about habitats from articles and databases, BRC and genetic databases.
  • Cocitations is an on-line interface that indexes PubMed reference sentences on Bacillus subtilis model bacteria that mention at least two gene or protein names. The user can query CoCitation by one gene or protein name or two and display the sentences with the name underlined. Synonyms and renaming are handled. S/he can also search for genetic information through the IGo portal.
  • OntoBiotope Database is an on-line service for the navigation through the OntoBiotope database of microorganisms and habitats described in PubMed reference. The result of the user query is display through a treemap representation.