PubMed Ontology LEarning with Deep Learning (POLE-DL)

Distributional semantic models (DSMs) derive representations for words in such a way that words occurring in similar contexts will have similar representations. Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) are well-known DSM. Although neural models are not new in DSMs, recent advances in artificial neural networks make feasible the derivation of words from corpora of billions of words: hence the growing interest in Deep Learning. However, word embeddings (i.e. distributed word representations typically induced using neural language models) and traditional distributional semantics methods lack of precise formal definitions that can be found in an ontology. According to Maedche and Staab, an ontology can be described as: “sets of concepts, relations, lexical entries, and links between these entities”.

Semantic Deep Learning

Semantic Deep Learning is a newly coined term for an emerging area that combines Semantic Web resources/technologies and Deep Learning. The Semantic Web embodies standards and tools for publishing and processing meta-data, where ontologies stand at its core.

Our work is within the area of Semantic Deep Learning and aims to achieve automatic identification of concepts and relations from both biomedical publications and clinical narratives. We focus mostly on two large-scale datasets: PubMed and Small Animal Veterinary Surveillance Network (SAVSNET). PubMed is the largest biomedical resource. In February 2018, PubMed contained 28M citations with an average of 2 papers added per minute in 2016. SAVSNET has recently collected a 2.5M free-text de-identified clinical veterinary narratives dataset from approximately 500 veterinary clinics across the UK.

Text Mining