Adat- és Számításintenzív Tudományok Kutatócsoport https://wigner.hu/hu hu 2022_Data and Compute Intensive Sciences Research Group https://wigner.hu/hu/node/2494 <span class="field field--name-title field--type-string field--label-hidden">2022_Data and Compute Intensive Sciences Research Group</span> <div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><h4><strong>2022</strong></h4> <p><strong>Activity of our group in 2022</strong> — The Data and Compute Intensive Sciences Research Group is a relatively new group, started its activity last year. The group counts three members.</p> <p>Our primary goal is to transfer physical ideas to Artificial Intelligence (AI), or Data Science in general. The physical discipline in the background is Renormalization Group in particle and statistical phyiscs. Its ideas can be generalized to other, more complex systems. The  concepts of relevant and irrelevant quantities, the “running” couplings has meaning, for example, in pattern recognition, too.</p> <p>Last year we succeeded to establish the basic general ideas in some publications.</p> <p>In this year we pursued this research by studying the role of the entropy in this context. Understanding a topic is equivalent to have an effective representation of data. This means that we shall store the minimal information about the system, also meaning that the representation should be ordered according to the given data. One feels that the corresponding measure of disorder, the entropy will decrease if we approach better and better representations. This intuition was formed in a mathematically clean formalism in Refs [<a href="https://m2.mtmt.hu/gui2/?mode=browse&params=publication;32650968">3</a>] and [<a href="https://m2.mtmt.hu/gui2/?mode=browse&params=publication;33620208">4</a>].</p> <p>After lying down the theoretical backgrounds, we started to apply our method to different systems. We studied time series from various sources, tried to find those features that are appropriate to identify them, and used these features for classification or even for predicting future data.</p> <p>One such application aimed to classify recorded human motions. Volunteers perforemd seven type of motions, which was characterized by the position of three markers. The task was to find out from the marker data the type of the motion. We solved this problem by setting up a feature space from the linear laws found among the time dependent marker positions. After this Linear Law based feature space Transformation (LLT) we applied standard classificaiton tools (like random forest, KNN or SVM). With the best KNN algorithm, an error-free classification could be reached, first in the literature. We published the results in Ref. [<a href="https://m2.mtmt.hu/gui2/?mode=browse&params=publication;33195778">1</a>].</p> <p>Another application concerned nonlinear laws. We studied few degree of freedom mechanical systems, tried to find out the euqations of motion, and with the detected laws we predicted the continuation of the motion. It was remarkable that the numerically determined second order equaiton of motion was not stable enough to give a long-time prediction for the behaviour of the system; we needed to determine also the conserved quantities as independent variables, and restricted the time evolution to respect these conservation laws, too. We studied integrable as well as chaotic systems, and found that our method worked equally well in both cases. The study was published in Ref. [<a href="https://m2.mtmt.hu/gui2/?mode=browse&params=publication;32915930">2</a>].</p> <p>An ongoing work exmines ECG signals to separate the normal and ectopic beats. We use the linear laws also here. There are a lot of issues in the data analysis, for example separate the  features of individual patients from the general features. Here the success rate is about 96%, which is a state-of-art result. We plan to publish our work in the beginning of 2023.</p> <p>Last, but not least, we studied stochastic systems with linear laws. A finite state Markov chain can be uniquely characterize by a linear law applied to the embedded autocorrelation function. From the eigenvalues of its PCA matrix we can determine the number of (effective) states in the Markov chain. We used this method to analyse the historical cryptocurrency prive movements. We found time intervals, where the dimensionality of the Markov chain governing the crypto stock market prices was changed. This method is promising to detect suspicious price movements in a general environment, too. Our findings was published in the preprint Ref. [<a href="https://m2.mtmt.hu/gui2/?mode=browse&params=publication;32655252">5</a>], still awaiting journal publication.</p> <p>In the group we also pursued the study of the renormalization group in physics. We examined the possibility to describe spontaneous symmetry breaking without the usage of classical fields, only with the help of the running of the coupling constants of the effective action. As it turned out, the symmetry can be represented on the coupling constant space, and, as a consequence, the fixed points (or partial fixed points) should manifest a representation space of the underlying symmetries. Thus the symmetric and broken phase, which belong to different representaiton of the symmetry, can be associated to the partial fixed point structure of these models. The study and results are published in Ref. [<a href="https://m2.mtmt.hu/gui2/?mode=browse&params=publication;33023687">6</a>].</p> <p>MT Kurbucz was doing his independent studies with other collaborators. From his publications Ref[<a href="https://m2.mtmt.hu/gui2/?mode=browse&params=publication;32865101">7</a>] is very remarkable, since it is published in the renowned  Knowledge-Based Systems journal.</p> <p>We have several exciting plans for 2023. Among them we plan to continue the study of stochastic systems, and publish some of our studies. We try to pursue the physical FRG studies. But we also try to open a new topic that is a natural continuation of the few degree of freedom time series, and this is the application of our method to computer vision problems. In particular we have built a robot in the last year, and we will try to teach it for an autonomous orientation in the surrounding space. It is a challenging but also very interesting  task, which we share with some outstanding high school students, who participate in this project.</p> <p><br /> References:<br /> 1.    MT Kurbucz, P Pósfay, A Jakovác: Facilitating time series classification by linear law-based feature space transformation, Scientific Reports 12 (1), 1-7, IF: 4.996<br /> 2.    A Jakovac, MT Kurbucz, P Posfay: Reconstruction of observed mechanical motions with Artificial Intelligence tools, New J. Phys. 24 073021, IF: 3.729<br /> 3.    TS Biró, A Jakovác: Entropy of Artificial Intelligence, Universe 8 (1), 53, IF: 2.813<br /> 4.    A Jakovác, A Telcs: A Note on Representational Understanding: Entropy 24 (9), 1313, IF: 2.738<br /> 5.    MT Kurbucz, P Pósfay, A Jakovác: Linear Laws of Markov Chains with an Application for Anomaly Detection in Bitcoin Prices, arXiv preprint arXiv:2201.09790<br /> 6.    A Jakovac, P Mati, P Posfay: Spontaneous Symmetry Breaking without classical fields: a Functional Renormalization Group approach, Phys. Rev. D 106, 025017, IF: 5.296<br /> 7.    ZT Kosztyán, MT Kurbucz, AI Katona: Network-based dimensionality reduction of high-dimensional, low-sample-size datasets,, Knowledge-Based Systems, 109180, IF: 8.038<br /> 8.    MT Kurbucz: Modeling the Social Determinants of Official COVID-19 Reports in the Early Stages of the Pandemic, Journal of Applied Social Science 16 (1), 356-363, IF: 0.52<br /> 9.    ZT Kosztyán, E Bogdány, I Szalkai, MT Kurbucz: Impacts of synergies on software project scheduling, Annals of Operations Research 312 (2), 883-908, IF: 4.854<br /> 10.    MT Kurbucz, AI Katona: eudistance: Distance calculator for the different levels of European NUTS regions, Software Impacts, 100327, IF: 1.8<br /> 11.    ZT Kosztyán, F Király, MT Kurbucz: Analysis of ownership network of European companies using gravity models, Applied Network Science 7 (1), 1-31, IF: 2.646<br /> 12.    MT Kurbucz: hdData360r: A high-dimensional panel data compiler for governance, trade, and competitiveness indicators of World Bank Group platforms, SoftwareX 21, 101297, IF: 2.868</p> <p> </p> </div> <span class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="https://wigner.hu/hu/user/124" typeof="schema:Person" property="schema:name" datatype="" xml:lang="">Pentek Csilla</span></span> <span class="field field--name-created field--type-created field--label-hidden">p, 02/10/2023 - 10:15</span> <div class="field field--name-field-ev field--type-datetime field--label-above"> <div class="field__label">Év</div> <div class="field__item"><time datetime="2023-02-10T12:00:00Z" class="datetime">p, 02/10/2023 - 12:00</time> </div> </div> Fri, 10 Feb 2023 09:15:18 +0000 Pentek Csilla 2494 at https://wigner.hu 2021_Data and Compute Intensive Sciences Research Group https://wigner.hu/hu/node/2321 <span class="field field--name-title field--type-string field--label-hidden">2021_Data and Compute Intensive Sciences Research Group</span> <div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><h4><strong>2021</strong></h4> <p><strong>Activity of our group in 2021 </strong>— The Data and Compute Intensive Sciences Research Group is a new (or revitalized) group started actual activity only this year. This small group currently has three members, one of us, Péter Pósfay, joined us in May.<br /> Our primary goal is to transfer physical ideas to Artificial Intelligence (AI) and Data Science in general. The physical discipline that some of us was pursuing for a longer time is Renormalization Group in particle and statistical phyiscs. The ideas of that discipline can be generalized to other, more complex systems. The  concepts of relevant and irrelevant quantities, the “running” couplings has meaning, for example, in pattern recognition, too.</p> <p>The basic ideas of these studies was first published on arXiv in 2020 [<a href="https://arxiv.org/abs/2010.13482">1</a>]. The main observation here was that the simple task of determining whether an item of a large (finite) set X is an element of a subset Y⊂X or not, can be performed numerically by defining proper features. Some of the features are are constant throughout the subset Y, these are called relevant features or simply laws, some are uniformly distributed over Y, these are the irrelevant features. As we have shown, all subsets can be effectively characterized by independent features, and all AI tasks (such as classification, regression, compression, encoding, etc.) can be easily done once we know the correct features. We also demonstrated that understanding both in natural sciences and in artificial intelligence can be described along these concepts.</p> <p>In this year the general ideas laid down in [<a href="https://arxiv.org/abs/2010.13482">1</a>] was refined and started to be applied to more practical problems. One was the study of linear laws in time series [<a href="https://arxiv.org/abs/2104.10970">2</a>]. We demonstrated here how the linear laws can be determined, and used to reproduce the measured data, how these techniques are related to PCA analysis. We applied the results for lossy compression of musical data.</p> <p>There are more projects that are actively under study at present. One belongs to the broader project of determining causality of stochastic (sub)processes. We do here both theoretical work and application to practical problems, in particular for financial data. Another project deals with the entropy of learning, and tries to find a formula that diminishes in a proper general learning process. The topic of a next project is the study of the ECG (Electrocardiogram), and the goal is to define relevant features of the different heart beat types, thus allowing classification of the heart beats. Last, but not least, we try to determine the dynamic laws of an observed motion, set up a recursion to reproduce it, and continue the  motion in a plausible way. We hope that in the next year’s annals we can report about the success of all of these projects.</p> <p>We also actively try to find industrial partnership in topics that concern us. There are several partners with whom potential cooperation is conceivable.</p> </div> <span class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="https://wigner.hu/hu/user/124" typeof="schema:Person" property="schema:name" datatype="" xml:lang="">Pentek Csilla</span></span> <span class="field field--name-created field--type-created field--label-hidden">k, 08/30/2022 - 09:43</span> <div class="field field--name-field-ev field--type-datetime field--label-above"> <div class="field__label">Év</div> <div class="field__item"><time datetime="2022-08-30T12:00:00Z" class="datetime">k, 08/30/2022 - 12:00</time> </div> </div> Tue, 30 Aug 2022 07:43:42 +0000 Pentek Csilla 2321 at https://wigner.hu 2020_Data and Compute Intensive Sciences https://wigner.hu/hu/node/1728 <span class="field field--name-title field--type-string field--label-hidden">2020_Data and Compute Intensive Sciences</span> <div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><h4><strong>2020</strong></h4> <p><strong>Artificial Intelligence in physics </strong>— This group was reformed at September, with only two members, one is the group leader, the other, István Csabai is a part-time collaborator. In this starting period of time there was not enough time to achieve important results.<br /> The main profile of the research group is big data analysis and artificial intelligence (AI). Our group follows a novel approach to the AI, which is closely related to the renormalization group ideas in physics<a href="https://arxiv.org/abs/2010.13482"> [1]</a>. Contrary to the probability theory interpretation of the neural network results, we try to determine most useful features in the input. A feature is a property of a set of the inputs (a class), that are either constant (say zero) for all elements of the class, or they takes different values in the class elements. The first type is called relevant features, the second irrelevant features.</p> <p>A relevant feature is called <em>physical law</em> in physics systems. A law in time series can be formed to have a time evolution, and leads to predictions in the system. In general each law singles out a subset of all possible inputs, where it fulfills, this subset contains our class where we determined the feature. Several relevant features choose the intersection of the individual subsets, and thus with an appropriate number of relevant features we can uniquely identify the desired subset.</p> <p>The irrelevant features are useful when we know that the input belongs to a given class, and we want to tell apart the individual elements. Since the relevant features are constant in the whole class, they do not play a role in this task. Therefore the irrelevant features provide a lossless compression of the given class.</p> <p>These ideas will be used later to analyze time series (outlier identification, large step time evolution), and in image recognition tasks in present projects. Since then our group is expanding, we will have two part-time colleges working with us in the next year.</p> </div> <span class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="https://wigner.hu/hu/user/124" typeof="schema:Person" property="schema:name" datatype="" xml:lang="">Pentek Csilla</span></span> <span class="field field--name-created field--type-created field--label-hidden">h, 02/22/2021 - 15:46</span> <div class="field field--name-field-ev field--type-datetime field--label-above"> <div class="field__label">Év</div> <div class="field__item"><time datetime="2020-01-02T12:00:00Z" class="datetime">cs, 01/02/2020 - 12:00</time> </div> </div> Mon, 22 Feb 2021 14:46:29 +0000 Pentek Csilla 1728 at https://wigner.hu 2019_Data and Compute Intensive Sciences https://wigner.hu/hu/komputacios-tudomanyok-osztalya/adat-es-szamitasintenziv-tudomanyok-kutatocsoport/research <span class="field field--name-title field--type-string field--label-hidden">2019_Data and Compute Intensive Sciences</span> <div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><h4><strong>2019</strong></h4> <p><strong>Machine learning and data analytics platform for infectious disease genetics.</strong> — Our group’s focus is to foster research in data and computation intensive research areas. The last two decades have seen an unprecedented change in almost all areas of sciences. Before that most disciplines were determined by the scarcity of experimental data. The exponential pace of microelectronics development has changed this, on one hand by making available high throughput sensors and digital instruments and on the other by providing high speed computers with large storage and fast interconnecting network. Beyond the almost limitless opportunities there are demanding challenges, too: how to handle the data avalanche from experiments, how to get out the most from information technology in various scientific disciplines, and how to understand and manages the ever-growing complexity of the computational system itself. We study computer networks and systems like it was a “natural phenomena” and with continuously following the technologies, we use them for analyzing science data in various fields from genomics to cosmology. </p> <p>We are part of a large European H2020 project, COMPARE in which bioinformatics tools are developed for outbreak detection. The health of humans and animals around the world is increasingly under threat due to new and recurring epidemics and foodborne disease outbreaks, which place pressure on health services and the production of livestock. These epidemics also reduce consumer confidence in food and negatively impact trade and food security. The longer it takes from the start of an outbreak of for example Ebola, influenza or salmonella until it is detected and stopped, the greater the consequences. The most important factor in being able to limit the consequences and costs of such outbreaks is the ability to quickly identify the disease-causing microorganisms that are causing the disease. Also, there is the need for knowledge about the mechanisms that cause the disease, and how the bacteria are transmitted to and between humans. The goal of the COMPARE project is a better surveillance system for infectious diseases, to speed up the detection of and response to disease outbreaks among humans and animals worldwide using new genome technology. Our group is responsible for the advanced database and data analysis system which will store, analyse and share the genomic data collected by researchers all over the world. We develop a “virtual research environment”, where interested partners can log in, and use the already installed tools, software and data together with their own to do research (Fig. 1). Wigner Cloud is used as a hardware backend for developing the portal. We are also involved in the development of machine learning methods, like artificial neural networks for inferring antibiotic resistance based on the genetic sequences of bacteria.</p> <img alt="data and compute intensive science" data-entity-type="file" data-entity-uuid="0afefd5b-7978-49eb-b489-9d65a0852c31" src="https://wigner.hu/sites/default/files/inline-images/datascience.png" width="500" class="align-center" /> <p class="text-align-center"><em>Figure 1. Snapshot of pathogen genome data analysis in the COMPARE Data Hub. </em> </p> <p> </p> </div> <span class="field field--name-uid field--type-entity-reference field--label-hidden"><span lang="" about="https://wigner.hu/hu/user/124" typeof="schema:Person" property="schema:name" datatype="" xml:lang="">Pentek Csilla</span></span> <span class="field field--name-created field--type-created field--label-hidden">sze, 07/01/2020 - 13:58</span> <div class="field field--name-field-ev field--type-datetime field--label-above"> <div class="field__label">Év</div> <div class="field__item"><time datetime="2019-01-02T12:00:00Z" class="datetime">sze, 01/02/2019 - 12:00</time> </div> </div> Wed, 01 Jul 2020 11:58:04 +0000 Pentek Csilla 1510 at https://wigner.hu