• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Group and Shuffle: Researchers at HSE University and AIRI Accelerate Neural Network Fine-Tuning

Group and Shuffle: Researchers at HSE University and AIRI Accelerate Neural Network Fine-Tuning

© iStock

Researchers at HSE University and the AIRI Institute have proposed a method for quickly fine-tuning neural networks. Their approach involves processing data in groups and then optimally shuffling these groups to improve their interactions. The method outperforms alternatives in image generation and analysis, as well as in fine-tuning text models, all while requiring less memory and training time. The results have been presented at the NeurIPS 2024 Conference.

The larger the neural network, the more challenging it becomes to quickly adapt it to a new task. Retraining a model from scratch is a time-consuming and costly process. Therefore, developers seek cost-effective ways to adapt a model to a specific task while preserving the overall quality of the original.

One such approach is fine-tuning using orthogonal matrices, which, unlike other methods, preserve the essential features of the original model. Popular alternatives, such as block-diagonal or butterfly matrices, have drawbacks: they are either limited in scope or require extensive computations.

Researchers at the HSE Faculty of Computer Science and the AIRI Institute have proposed a new method of constructing matrices, which they call Group-and-Shuffle. Instead of working with all the data at once, they divide the parameters into small groups, process each group separately, and then shuffle them together. This structure is both flexible and efficient: it enables the model to adapt more precisely to the task while requiring fewer computations and less memory.

Building on GS matrices, the researchers developed GSOFT, a new method for orthogonal fine-tuning of neural networks. Unlike previous approaches, GSOFT uses fewer parameters while maintaining training stability and quality, even with limited data. The team also introduced a two-sided version of the method—Double GSOFT—which allows simultaneous adjustment of parameters from both sides, enhancing the model’s flexibility and accuracy.

'We discovered how to construct orthogonal matrices using only two special types of matrices, instead of five or six as required by previous methods. This saves computational resources and training time,' explains Nikolay Yudin, Research Assistant at the HSE Laboratory for Matrix and Tensor Methods in Machine Learning.

The researchers tested the approach on three types of tasks. When fine-tuning the RoBERTa language model, the method outperformed others while using a comparable number of parameters. In image generation, where the model needed to preserve the original features while adapting to the user’s request, GSOFT and Double GSOFT outperformed popular methods like LoRA and BOFT, all while using less memory and training time.

Subject-driven generation visual results on 3,000 training iterations
© Gorbunov, M., Yudin, N., Soboleva, V., Alanov, A., Naumov, A., Rakhuba, M. (2024). Group and shuffle: Efficient structured orthogonal parametrization. arXiv preprint arXiv:2406.10019.

The authors also tested their approach on convolutional neural networks, which are commonly used for image and video analysis, such as in face recognition. The team adapted the GS matrices even for cases where the model required strong resistance to interference and distortion.

'We tested the method across various scenarios—from language and generative models to robust convolutional networks. In every case, it performed reliably while using fewer resources. This confirms that the method can be applied effectively to a variety of purposes,' comments Aibek Alanov, Senior Research Fellow at the Centre of Deep Learning and Bayesian Methods, AI and Digital Science Institute, HSE FCS, and leader of the Controllable Generative AI team at FusionBrain, AIRI.

See also:

HSE Neurolinguists Reveal What Makes Apps Effective for Aphasia Rehabilitation

Scientists at the HSE Centre for Language and Brain have identified key factors that increase the effectiveness of mobile and computer-based applications for aphasia rehabilitation. These key factors include automated feedback, a variety of tasks within the application, extended treatment duration, and ongoing interaction between the user and the clinician. The article has been published in NeuroRehabilitation.

'Our Goal Is Not to Determine Which Version Is Correct but to Explore the Variability'

The International Linguistic Convergence Laboratory at the HSE Faculty of Humanities studies the processes of convergence among languages spoken in regions with mixed, multiethnic populations. Research conducted by linguists at HSE University contributes to understanding the history of language development and explores how languages are perceived and used in multilingual environments. George Moroz, head of the laboratory, shares more details in an interview with the HSE News Service.

Slim vs Fat: Overweight Russians Earn Less

Overweight Russians tend to earn significantly less than their slimmer counterparts, with a 10% increase in body mass index (BMI) associated with a 9% decrease in wages. These are the findings made by Anastasiia Deeva, lecturer at the HSE Faculty of Economic Sciences and intern researcher in Laboratory of Economic Research in Public Sector. The article has been published in Voprosy Statistiki.

Scientists Reveal Cognitive Mechanisms Involved in Bipolar Disorder

An international team of researchers including scientists from HSE University has experimentally demonstrated that individuals with bipolar disorder tend to perceive the world as more volatile than it actually is, which often leads them to make irrational decisions. The scientists suggest that their findings could lead to the development of more accurate methods for diagnosing and treating bipolar disorder in the future. The article has been published in Translational Psychiatry.

Scientists Develop AI Tool for Designing Novel Materials

An international team of scientists, including researchers from HSE University, has developed a new generative model called the Wyckoff Transformer (WyFormer) for creating symmetrical crystal structures. The neural network will make it possible to design materials with specified properties for use in semiconductors, solar panels, medical devices, and other high-tech applications. The scientists will present their work at ICML, a leading international conference on machine learning, on July 15 in Vancouver. A preprint of the paper is available on arxiv.org, with the code and data released under an open-source license.

HSE Linguists Study How Bilinguals Use Phrases with Numerals in Russian

Researchers at HSE University analysed over 4,000 examples of Russian spoken by bilinguals for whom Russian is a second language, collected from seven regions of Russia. They found that most non-standard numeral constructions are influenced not only by the speakers’ native languages but also by how frequently these expressions occur in everyday speech. For example, common phrases like 'two hours' or 'five kilometres’ almost always match the standard literary form, while less familiar expressions—especially those involving the numerals two to four or collective forms like dvoe and troe (used for referring to people)—often differ from the norm. The study has been published in Journal of Bilingualism.

Overcoming Baby Duck Syndrome: How Repeated Use Improves Acceptance of Interface Updates

Users often prefer older versions of interfaces due to a cognitive bias known as the baby duck syndrome, where their first experience with an interface becomes the benchmark against which all future updates are judged. However, an experiment conducted by researchers from HSE University produced an encouraging result: simply re-exposing users to the updated interface reduced the bias and improved their overall perception of the new version. The study has been published in Cognitive Processing.

Mathematicians from HSE Campus in Nizhny Novgorod Prove Existence of Robust Chaos in Complex Systems

Researchers from the International Laboratory of Dynamical Systems and Applications at the HSE Campus in Nizhny Novgorod have developed a theory that enables a mathematical proof of robust chaotic dynamics in networks of interacting elements. This research opens up new possibilities for exploring complex dynamical processes in neuroscience, biology, medicine, chemistry, optics, and other fields. The study findings have been accepted for publication in Physical Review Letters, a leading international journal. The findings are available on arXiv.org.

Mathematicians from HSE University–Nizhny Novgorod Solve 57-Year-Old Problem

In 1968, American mathematician Paul Chernoff proposed a theorem that allows for the approximate calculation of operator semigroups, complex but useful mathematical constructions that describe how the states of multiparticle systems change over time. The method is based on a sequence of approximations—steps which make the result increasingly accurate. But until now it was unclear how quickly these steps lead to the result and what exactly influences this speed. This problem has been fully solved for the first time by mathematicians Oleg Galkin and Ivan Remizov from the Nizhny Novgorod campus of HSE University. Their work paves the way for more reliable calculations in various fields of science. The results were published in the Israel Journal of Mathematics (Q1).

Large Language Models No Longer Require Powerful Servers

Scientists from Yandex, HSE University, MIT, KAUST, and ISTA have made a breakthrough in optimising LLMs. Yandex Research, in collaboration with leading science and technology universities, has developed a method for rapidly compressing large language models (LLMs) without compromising quality. Now, a smartphone or laptop is enough to work with LLMs—there's no need for expensive servers or high-powered GPUs.