• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Tabular Deep Learning

Student: Nikolai Kartashev

Supervisor: Maxim A. Babenko

Faculty: Faculty of Computer Science

Educational Programme: Modern Computer Science (Master)

Final Grade: 9

Year of Graduation: 2024

In the machine learning literature, the evaluation of models trained on tabular data is often performed on academic datasets that may not accurately reflect real-world use cases. This paper aims to address this issue by critically assessing the quality of popular benchmark datasets and identifying their limitations. We find that many of these datasets are synthetic, represent non-tabular modalities such as images, or consist of snapshots of real-world phenomena or socio-demographic data that are more suited for visualization and analysis rather than prediction. Moreover, these datasets often lack canonical and representative test sets, and some even have data leakage issues. To make the evaluation of tabular models more reliable, this paper constructs a new benchmark consisting of seven industrial-grade datasets that represent real-world scenarios more adequately. In particular, all datasets in our benchmark possess a temporal shift between the train/test subsets, which reflects the fact that in practice models trained on “older” datapoints are to be applied to “newer” ones. Furthermore, we assess the performance of state-of-the-art tabular DL models on these datasets and discover that some recent findings are not consistently beneficial and simpler methods are often preferred.

Full text (added May 27, 2024)

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses