Master Thesis - Advanced intent recognition and question answering for search engines using Machine Learning

Job Description

Are you a master student planning to write your Master Thesis during spring 2020? Join us on our journey into the future #Siemens


Be part of an open and dynamic workplace where professional and personal development is high on the agenda. By making sustainable energy solutions more cost effective, developing new technologies for the future's smart industry and electrifying passenger and freight transport, we make reality of our vision of a sustainable world.


We are now looking for a student to take on the assignment “Advanced intent recognition and question answering for search engines using Machine Learning.”


Who we are?

Data Analytics in SIT AB, the digitalization transformation performed by Siemens during the recent years has had many consequences. One of the most important ones has been the establishment of processes to collect and maintain useful data in a database format. On the one hand, the extensive maintenance reports database provides the company with useful information about unexpected events, component repair and operation history. On the other hand, multiple sensors placed along the turbine deliver information about thermodynamic parameters and operating parameters as the amount of produced MWh. The data analytics department has been working extensively using this data for both internal and external customers. For almost 3 years we also have been implementing Knowledge-Graph technologies due to their high flexibility and data modelling capabilities. We have developed mappings, ontologies and ETL loads to populate our graphs and now we are ready for the next step.


The assignment:

This project is part of Siemens efforts to collect, connect, maintain and analyze the life-cycle data of their different products in the smartest way possible. In this project we would like to develop the first steps to create a fully automated search engine with intent recognition.


Unlike another data, industrial domain knowledge is extremely complex to model. Same concepts (as for instance site) can have different meanings across different business units even that they are referring to the same idea. Current ontologies support synonyms and many different features that can deal with this characteristic, but they need to be added by a knowledge engineer. Besides, incorporate individual user knowledge is not always straight forward as usually ontologies are suited for domain knowledge modelling but not for individuals.


In this master thesis you should develop a holistic approach to the intent recognition and question and answering problem. Some of the following questions should be answered:

  1. Can the question and answering (QA) and intent recognition problem in Siemens industrial knowledge domain be formulated according to some pre-existing frameworks?

  2. Do we have alternatives to this synonym addition in the ontologies? Can we build some sort of pattern matching algorithms to link different inputs to the search engine with already preexisting contents?

  3. Is the data contained in our graph more willing to be approached by an open-domain QA system or by closed-domain QA system?

  4. Can pre-existing practices such as word frequency analysis, n-gram techniques, word embeddings distances etc… help in the question and answering problem?

  5. Do we need to pre-generate and specific dataset for QA according the content of our graph or can we start to apply machine learning algorithms directly to our graph?

  6. What kind of algorithms are useful to predict the typing of the user of the search engine (review of seq2seq models, transformers, bi-direcctional RNN)?

  7. How the type-ahead prediction, question and answering frameworks and ontologies semantics can be combined to build the fully automatic search engine.

Due to the full scope of the problem, the student should divide the thesis in two main blocks:

  1. Develop the holistic approach to the problem.

  2. Pick one (or several) of the questions mentioned above and develop it in detail.

Students will be provided with access to all the needed data. They will be working closely with domain experts with strong backgrounds in knowledge domain modelling, semantics, machine learning and deep learning. Besides they will be in touch with our different research teams over the world that are already working in similar topics.



Your Profile: 

  • The project is suitable for one or two students with academic background in engineering, computer science, mathematics or another relevant field.

  • As a student you have strong analytical skills and solid mathematical background.

  • Besides, you are interested in data analytics (especially in prescriptive analytics and hold good programming skills.

  • We consider meritorious skills the knowledge of NLP machine learning oriented libraries (such as DeepPavlov, PyTorch, Tensorflow, Keras or Stanford’s CoreNLP) and data handling libraries (Pandas or Tidyverse).

  • We consider meritorious SparQL knowledge.

  • We consider meritorious ontology and semantics knowledge.

 

Application:

Do not hesitate - apply today via siemens.se ref nr 179988 and no later than 2019-11-30. For questions about the role please contact recruiting manager Ronny Nordberg  ronny.norberg@siemens.com. For questions about the technicalities of the projects please contact edgar.bahilo_rodriguez@siemens.com, rodriguez@siemens.com or davood.naderi@siemens.com.


Trade Union representatives:
Christine Lindström, Unionen, 0122-817 28
Simon Bruneflod, Sveriges Ingenjörer, 0122-842 24
Jan Lundgren, Ledarna, 0122-812 33
Kenth Gustavsson, IF Metall, 0122-815 25

--------------------------------------------------------------------------------


In this recruitment we renounce all calls relating to advertising and recruitment support.

 

 


Job ID: 179988

Organization: Gas and Power

Company: Siemens Industrial Turbomachinery AB

Experience Level: Student (Not Yet Graduated)

Job Type: Full-time

Can't find what you are looking for?

Let's stay connected

Can't find what you are looking for?