• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

New Clustering Method Simplifies Analysis of Large Data Sets

New Clustering Method Simplifies Analysis of Large Data Sets

© iStock

Researchers from HSE University and the Institute of Control Sciences of the Russian Academy of Sciences have proposed a new method of data analysis: tunnel clustering. It allows for the rapid identification of groups of similar objects and requires fewer computational resources than traditional methods. Depending on the data configuration, the algorithm can operate dozens of times faster than its counterparts. The study was published in the journal Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia.

Each year, the volume of information requiring processing continues to grow. Data comes from a variety of sources: scientific research, financial reports, medical examinations, and many others. Clustering methods—which group data based on similar characteristics—are used to detect patterns and organise information within such large datasets. These groupings are known as clusters.

One of the most widely used clustering methods is the k-means algorithm. It divides data into a predetermined number of clusters, initially selecting their centres (centroids). However, this method has a limitation: the number of clusters must be known beforehand, which is not always possible when dealing with complex data. Scientists from HSE University and the V.A. Trapeznikov Institute of Control Sciences have proposed a new approach to simplify this process—tunnel clustering. Unlike the k-means method, this algorithm does not require the number of clusters to be set in advance; it determines the necessary number itself by analysing the data structure.

‘The algorithm forms “tunnels” in the data—regions in multidimensional space where objects with similar characteristics group together,’ explained Fuad Aleskerov, Head of the Department of Mathematics at the HSE Faculty of Economic Sciences. ‘Users can choose from three modes of operation: with fixed cluster boundaries, with adaptive boundaries that adjust to the data structure, or a combined approach. This makes the method flexible and suitable for various types of tasks.’

The method was tested on a synthetic (artificially generated) dataset of 100,000 objects, as well as on real-world tasks in public administration and the banking sector.

Visualisation of the original data and the results of tunnel clustering in a four-dimensional parallel coordinates system.
© Aleskerov, F.T., Myachin, A.L. & Yakuba, V.I. Tunnel Clustering Method. Dokl. Math. 110, 474–479 (2024)

The main advantage of the new method is its speed. Unlike classical algorithms that demand significant computational resources, tunnel clustering can, depending on the data configuration, perform the analysis dozens of times faster.

In addition, the researchers introduced the concept of the ‘transition degree’—a parameter indicating how many characteristics of an object must change for it to be classified into a different cluster. This helps assess the clarity of cluster boundaries and identify objects situated at the intersection of different groups.

‘People are generating more and more data, and the pace is only accelerating. According to the latest Digital 2025: Global Overview Report, as of early 2025, there were 5.56 billion internet users—nearly 68% of the global population. Adults spend an average of 6 hours and 38 minutes online each day, communicating, working, watching videos, and consuming content,’ said Alexey Myachin, Senior Research Fellow at the HSE International Centre for Decision Choice and Analysis. ‘Companies that ignore data analysis are losing vast sums of money.’

The authors continue to refine the algorithm, including conducting research into dimensionality reduction, which will help further decrease the time required to identify patterns in data.

The study was carried out with partial support from the Russian Science Foundation.

See also:

Larger Groups of Students Use AI More Effectively in Learning

Researchers at the Institute of Education and the Faculty of Economic Sciences at HSE University have studied what factors determine the success of student group projects when they are completed with the help of artificial intelligence (AI). Their findings suggest that, in addition to the knowledge level of the team members, the size of the group also plays a significant role—the larger it is, the more efficient the process becomes. The study was published in Innovations in Education and Teaching International.

New Models for Studying Diseases: From Petri Dishes to Organs-on-a-Chip

Biologists from HSE University, in collaboration with researchers from the Kulakov National Medical Research Centre for Obstetrics, Gynecology, and Perinatology, have used advanced microfluidic technologies to study preeclampsia—one of the most dangerous pregnancy complications, posing serious risks to the life and health of both mother and child. In a paper published in BioChip Journal, the researchers review modern cellular models—including advanced placenta-on-a-chip technologies—that offer deeper insights into the mechanisms of the disorder and support the development of effective treatments.

Using Two Cryptocurrencies Enhances Volatility Forecasting

Researchers from the HSE Faculty of Economic Sciences have found that Bitcoin price volatility can be effectively predicted using Ethereum, the second-most popular cryptocurrency. Incorporating Ethereum into a predictive model reduces the forecast error to 23%, outperforming neural networks and other complex algorithms. The article has been published in Applied Econometrics.

Administrative Staff Are Crucial to University Efficiency—But Only in Teaching-Oriented Institutions

An international team of researchers, including scholars from HSE University, has analysed how the number of non-academic staff affects a university’s performance. The study found that the outcome depends on the institution’s profile: in research universities, the share of administrative and support staff has no effect on efficiency, whereas in teaching-oriented universities, there is a positive correlation. The findings have been published in Applied Economics.

Physicists at HSE University Reveal How Vortices Behave in Two-Dimensional Turbulence

Researchers from the Landau Institute for Theoretical Physics of the Russian Academy of Sciences and the HSE University's Faculty of Physics have discovered how external forces affect the behaviour of turbulent flows. The scientists showed that even a small external torque can stabilise the system and extend the lifetime of large vortices. These findings may improve the accuracy of models of atmospheric and oceanic circulation. The paper has been published in Physics of Fluids.

Solvent Instead of Toxic Reagents: Chemists Develop Environmentally Friendly Method for Synthesising Aniline Derivatives

An international team of researchers, including chemists from HSE University and the A.N. Nesmeyanov Institute of Organoelement Compounds of the Russian Academy of Sciences (INEOS RAS), has developed a new method for synthesising aniline derivatives—compounds widely used in the production of medicines, dyes, and electronic materials. Instead of relying on toxic and expensive reagents, they proposed using tetrahydrofuran, which can be derived from renewable raw materials. The reaction was carried out in the presence of readily available cobalt salts and syngas. This approach reduces hazardous waste and simplifies the production process, making it more environmentally friendly. The study has been published in ChemSusChem.

How Colour Affects Pricing: Why Art Collectors Pay More for Blue

Economists from HSE University, St Petersburg State University, and the University of Florida have found which colours in abstract paintings increase their market value. An analysis of thousands of canvases sold at auctions revealed that buyers place a higher value on blue and favour bright, saturated palettes, while showing less appreciation for traditional colour schemes. The article has been published in Information Systems Frontiers.

New Method for Describing Graphene Simplifies Analysis of Nanomaterials

An international team, including scientists from HSE University, has proposed a new mathematical method to analyse the structure of graphene. The scientists demonstrated that the characteristics of a graphene lattice can be represented using a three-step random walk model of a particle. This approach allows the lattice to be described more quickly and without cumbersome calculations. The study has been published in Journal of Physics A: Mathematical and Theoretical.

Scientists Have Modelled Supercapacitor Operation at Molecular and Ionic Level

HSE scientists used supercomputer simulations to study the behaviour of ions and water molecules inside the nanopores of a supercapacitor. The results showed that even a very small amount of water alters the charge distribution inside the nanopores and influences the device’s energy storage capacity. This approach makes it possible to predict how supercapacitors behave under different electrolyte compositions and humidity conditions. The paper has been published in  Electrochimica Acta.  The study was supported by a grant from the Russian Science Foundation (RSF).

Designing an Accurate Reading Skills Test: Why Parallel Texts are Important in Dyslexia Diagnosis

Researchers from the HSE Centre for Language and Brain have developed a tool for accurately assessing reading skills in adults with reading impairments. It can be used, for instance, before and after sessions with a language therapist. The tool includes two texts that differ in content but are equal in complexity: participants were observed to read them at the same speed, make a similar number of errors, and understand the content to the same degree. Such parallel texts will enable more accurate diagnosis of dyslexia and better monitoring of the effectiveness of interventions aimed at addressing it. The paper has been published in Educational Studies.