PhD Position : NATURAL LANGUAGE PROCESSING of song LYRICS – WASABI project (Web Audio Semantic Aggregated in the Browser for Indexation, funded by ANR)

Semantic Aggregated in the Browser for Indexation, funded by ANR)

Streaming providers like Deezer, Spotify, Pandora or Apple Music enrich
the event of music listening by providing to the users additional
information as the biography of the artist, or recommending other albums
of the same artist, or of other artists that they consider to be "close"
to the selected one. In a similar way, journalists or DJs search for
information on the Web to prepare their broadcasts.
These case scenarios have in common the need and the consequent use of
musical knowledge bases, ranging from the result of keywords search on a
search engine, to more formalized ones as Spotify LastFM, MusicBrainz,
DBpedia and The Echo Nest audio extractors.

The goal of the WASABI project is to jointly use information extraction
algorithms and the Semantic Web to produce more consistent musical
knowledge bases. Then, Web Audio technologies are applied to explore
them in depht. More specifically, Semantic Web techniques and formalisms
are used to extract and structure data, to link and add metadata from
existing resources (such as recording studios, composers, the broadcast
year of a song/album). Textual data such as song lyrics or free text
related to the songs will be used as sources to extract implicit data
(such as the topics of the song, the places, people, events, dates
involved, or even the conveyed  emotions) using Natural Language
Processing algorithms. Jointly exploiting such knowledge, together with
information contained in the audio signal can improve the automatic
extraction of musical information, including for instance the tempo, the
presence and characterization of the voice, musical emotions, identify
plagiarism, or even facilitate the music unmixing.

In the context of the WASABI project, the goal of the Ph.D. is to
address the following challenges:
i)  Detection of the structure of the song, applying NLP algorithms
(see, e.g. [1])
ii) Event detection: analysis of the text of the song to extract the
context (both explicit or implicit references), as well as the
extraction of entities, geographic locations and time references
directly or indirectly expressed in the text.
iii) Topic modeling: implementation of probabilistic models to identify
topics or abstract themes in the lyrics by establishing relationships
between a set of documents and the terms they contain (see, e.g. [1,2]).
iv) Sentiment analysis: classification of the emotions in the songs,
using both music and song lyrics in a complementary manner. We will test
and adapt machine learning algorithms to capture information of emotions
expressed by the text of a song, exploiting both textual features, and
data extracted from the audio (see, e.g. [3])

- [1] Jose P. G. Mahedero, Álvaro MartÍnez, Pedro Cano, Markus
Koppenberger, and Fabien
Gouyon. 2005. Natural language  processing of lyrics. In Proceedings of
the 13th annual ACM international conference on Multimedia (MULTIMEDIA
'05). ACM, New York, NY, USA, 475-478.
- [2] Lucas Sterckx, Thomas Demeester, Johannes Deleu, Laurent Mertens,
Chris Develder, Assessing quality of unsupervised  topics in song
lyrics, 36th European Conference on Information Retrieval, Lecture Notes
in Computer Science (ECIR 2014)
- [3] Rada Mihalcea and Carlo Strapparava. 2012. Lyrics, music, and
emotions. In Proceedings of the 2012 Joint Conference on Empirical
Methods in Natural Language Processing and Computational Natural
Language Learning(EMNLP-CoNLL '12). Association for Computational
Linguistics, Stroudsburg, PA, USA, 590-599.

Skills and profile:

     • Master degree in Computer Science or Computer Engineering is
     • Programming skills.
     • Basic knowledge of Natural Language Processing, Semantic Web and
Machine Learning is preferred.
     • Fluent English required
     • Knowledge of French is preferred

About WIMMICS - Wimmics is a joint research
team between INRIA Sophia Antipolis - Méditerranée and I3S (CNRS and
University of Nice – Sophia Antipolis). The research fields of this team
are graph-oriented knowledge representation, reasoning and
operationalization to model and support actors, actions and interactions
in web-based epistemic communities.

Location: I3S laboratory, Sophia Antipolis, France.

Contact : Michel Buffa: Elena Cabrio: