Магистратура
2020/2021
Научно-исследовательский семинар "Интеллектуальные системы и структурный анализ"
Статус:
Курс обязательный (Науки о данных)
Направление:
01.04.02. Прикладная математика и информатика
Где читается:
Факультет компьютерных наук
Когда читается:
2-й курс, 1, 2 модуль
Формат изучения:
без онлайн-курса
Прогр. обучения:
Науки о данных
Язык:
английский
Кредиты:
8
Контактные часы:
32
Course Syllabus
Abstract
The discipline goal is to develop students' professional skills required for independent analytical work in applied fields of the computer science. Also, this course aims to improve skills of students in developing their research projects related with dialogue systems and chat bots. This course focuses on analysis of scientific and industrial linguistic system developing and motivates visiting different scientific colloquium at the university, especially at the faculty of computer science.
Learning Objectives
- The Research Seminar should help students to form the basic skills training to make and present their own research, motivate to engage in the scientific activity.
Expected Learning Outcomes
- Know basic principles of developing task-oriented linguistic dialogue systems.
- Know main principles of social bots.
- Know main principles of task-oriented bots.
- Know fundamental approaches to natural language understanding and dialogue management in the task-oriented dialogue systems.
- Know basic principles of assuring chat bot relevance at syntactic level.
- Know basic principles of Q/A for Bots.
- Know basic principles of discourse-level structures.
- Know basic principles of building taxonomy and thesaurus for chat bots.
- Know basic principles of chat bot content processing pipeline.
- Know basic principles of managing rhetorical agreement in dialogue utterances.
- Know basic principles of discourse-level dialogue management.
- Know basic principles of argumentation for chat bot.
- Formulate the task and goals for an independent research and/or scientific programing system development.
- Prepare a presentation based on his research and/or scientific programing system.
Course Contents
- A basic chat bot
- Building transactional chatbots with Api.ai;
- Building FAQ chatbot with Microsoft QnA Maker;
- A chatbot with rule-based dialogue management.
- Social Bots
- Main principles.
- Task-oriented Bots
- Main principles.
- NL Understanding
- Introduction to NLP and NLU.
- Assuring chat bot relevance at syntactic level
- Syntactic Generalization in search and relevance assessment;
- Generalizing portions of text;
- Generalizing at various levels: From words to paragraphs;
- Equivalence transformation on phrases;
- Simplified example of generalization of sentences;
- From syntax to inductive semantics;
- Nearest-neighbor learning of generalizations;
- Syntactic generalization-based search engine and its evaluation;
- User interface of search engine;
- Qualitative evaluation of search;
- Evaluation of web search relevance improvement;
- Evaluation of product search;
- Comparison with other means of search relevance improvement;
- Evaluation of text classification problems;
- Comparative performance analysis in text classification domains;
- Example of recognizing meaningless sentences;
- Commercial evaluation of text similarity improvement.
- Q/A for Bots: Semantic headers and semantic skeletons
- Learning Discourse-level structures
- Answering paragraph-size questions;
- From sentence-level to paragraph-level generalization;
- Rhetoric structures and speech acts as inter-sentence links;
- Adapting RST for multi-sentence search;
- Adapting Speech Act Theory for multi-sentence search;
- Parse thickets and their graph representation;
- Equivalence transformation of phrases;
- Finding similarity between two paragraphs of text;
- How coreferences help search recall;
- How rhetoric relation improve search accuracy;
- Thicket Phrases and their generalization;
- Example of parse thicket;
- Generalization of parse thickets;
- Generalization for RST arcs;
- Generalization for CA arcs;
- Computing maximal common sub-PTs;
- Architecture of PT processing system;
- Evaluation of PT-supported search relevance;
- Evaluation settings;
- Pair-wise sentence generalization for question-answer similarity;
- Single sentence query and answer distributed through multiple sentences;
- Query is a paragraph and answer is a paragraph;
- Phrase-based and graph-based implementation of generalization;
- Comparison of search performance with other studies.
- Building taxonomy and thesaurus for chat bots
- Improving search relevance by taxonomies;
- Must-occur keywords;
- Must-occur keywords in a taxonomy;
- Constructing relevance score function;
- Examples of filtering answers based on taxonomy;
- Taxonomy-based algorithm for filtering search results;
- Building taxonomies by web mining;
- Building taxonomy by generalizing search results;
- Practical considerations;
- Evaluation of search relevance improvement by taxonomies;
- Evaluation settings of search relevance improvement;
- Vertical search;
- Web search relevance improvement;
- Taxonomy-supported search engine in news domain;
- Taxonomies for query expansion;
- Using search in Similarity component;
- Running taxonomy learner.
- Chat bot content processing pipeline
- From search to personalized recommendations;
- A content pipeline and its relevance-related problems Content pipeline architecture;
- Content processing engines;
- Content processing units;
- Harvesting unit;
- Content mining unit Taxonomy unit;
- Opinion mining unit De-duplication unit Search Engine Marketing unit;
- Speech recognition semantics unit;
- Search unit;
- Personalization unit;
- Generalization of texts;
- Simplified example of generalization of sentences;
- Sample generalization between phrases;
- Tree Kernel approach for text similarity;
- Phrase-level generalization;
- Generalization of expressions of interest;
- Personalization algorithm as intersection of likes;
- Mapping categories of interest / taxonomies;
- Defeasible logic programming-based rule engine;
- Content pipeline algorithms;
- Taxonomy construction algorithm;
- De-duplication algorithms Sentiment analysis algorithm;
- Search engine marketing ad construction algorithm.
- Managing Rhetorical Agreement in Dialogue Utterances
- Communicative Discourse Trees;
- Representing rhetorical relations and communicative actions;
- Greedy representations for a Q/A pair;
- Communicative actions and their generalization;
- Generalization for RST relations;
- Representing a Request-Response chain;
- Classification settings for Request-Response pairs;
- Nearest Neighbor graph-based classification;
- Thicket Kernel learning for CDT;
- Implementation of Rhetorical Agreement classifier;
- Discourse Structure-Driven Dialogue Management;
- Maintaining cohesive session flow in a chat bot;
- Personalized Domain Exploration Scenarios;
- Navigation with the Extended Discourse Tree;
- Recognizing valid and invalid R-R pairs;
- CDT Construction Task;
- Managing dialogues and question answering;
- Analytical approaches to RR Agreement;
- Rhetorical relations and argumentation.
- Discourse-level Dialogue management
- Finding Answers with Optimal Rhetoric Representation;
- Adjusting rhetoric representation of answer to that of a question;
- Maintaining a sequence of discourse trees;
- Identifying rhetoric correlation;
- Building Dialogue Structure from Discourse Tree of a Query;
- Maintaining communicative discourse for Q and A;
- Learning complement relation.
- Data for chat bot training
- Argumentation for chat bot
Assessment Elements
- Project
- PresentationPresentation of the programming project, paper, dialogue system, or dialogue platform.
Speaking time is up to 30 min.
Interim Assessment
- Interim assessment (2 module)If a student chooses the track with the project, then the final mark is evaluated like:
Оfinal= 0.6•Оproject + 0.4•Оpresentation.
If a student chooses the track with the review of the existing platforms or papers, then the final mark is evaluated like: Оfinal= 1•Оpresentation.
Every track also includes providing final report on the project and public defense of the project in the form of presentation and attendance of at least 2 presentations.