• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

The Influence of LLM Inference Optimization on LLM Quality

Student: Borisov Artem

Supervisor: Tamara Voznesenskaya

Faculty: Faculty of Computer Science

Educational Programme: Data Science and Business Analytics (Bachelor)

Final Grade: 8

Year of Graduation: 2024

The exponential growth of large language models (LLMs) in recent years has driven signif- icant advancements in natural language processing (NLP). These models, with their unprece- dented capacity to understand and generate human-like text, have demonstrated remarkable performance across a wide range of applications. However, their increasing size and complex- ity pose substantial challenges in terms of computational requirements, memory consumption, and inference speed. This diploma explores three inference acceleration methods: quantiza- tion, flash-attention, and deepspeed. Quantization reduces the precision of data representations within neural networks, thereby decreasing computational demands and minimizing memory us- age. Flash-attention optimizes memory access patterns and computational efficiency, while deep- speed inference employs kernel-fusion and memory optimization techniques to enhance model performance. The study investigates these optimization strategies’ impact on model accuracy and performance using popular open-source models Llama2-7b, Mistral-7b and Mixtral-8x7b. The results provide insights into the trade-offs between efficiency and accuracy, offering a com- prehensive evaluation of these methods on MMLU and RUMMLU benchmarks. This work aims to contribute to making large language models more accessible and efficient, extending their applicability to a broader range of tasks and environments.

Full text (added May 27, 2024)

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses