## computational neuroscience

While network science offers exceptional tools for neuroscience, they are limited by the assumption that the system of interest fundamentally consists of pairwise interactions only. Algebraic topology, on the other hand, provides powerful tools for studying complex systems with higher-order interactions. I use algebraic topology to study neural activity structure.

In particular, I am interested in how multiple brain regions collectively process information. For example, it is known that information is projected from the primary visual cortex (V1) to higher visual areas such as the anterolateral area (AL). However, not much is understood regarding the nature of the projection: How selective is the information distributed to higher visual areas? How much of the information encoded is a result of the projection?

To address the above questions, I developed a framework for comparing neural activity structures in multiple brain regions. Topological methods can be applied to the correlation matrix of spike trains to extract encoded information in each brain region, as illustrated in the following figure. Given spike trains that have been recorded simultaneously from multiple brain regions, I used spike train correlations across brain regions to trace the encoded information from one brain region to another. The algorithm allows researchers to better understand the nature of the information distribution process across brain regions.

*Correlations among spike trains can be used to build a topological space representing the information encoded in the brain regions. The topological features are summarized in the barcodes. *

Paper and code are under preparation.

*Current project at University of Pennsylvania (2019-2020). Joint work with Chad Giusti (U. Delaware) and Robert Ghrist (U. Penn).*

## RNA structure sampling and clustering

A RiboNucleic Acid (RNA) can form complex structure through intra-molecular base-pairing. Some classes of RNAs can regulate biological functions by changing its conformations. An example is illustrated below.

*The SAM-III riboswitch folds into two distinct secondary structures. It regulates gene expression by exposing or sequestering the SD sequence, which controls translation.*

Identifying multiple structures of a RNA can bring therapeutic advancements for RNA viruses. A popular approach is to sample low-energy structures from the nearest neighbor thermodyanmic model. Most algorithms follow the general flow of **sampling**, **clustering**, and reporting **cluster representatives**.

I worked on improving the **clustering** aspect of an RNA structure prediction algorithm called profiling. The current method resulted in too many clsuters with negligible biological difference. I proposed algorithmic ways to identify clusters that should be merged based on structural similarity. The enhanced version of profiling is under development by Georgia Tech Discrete Mathematics and Molecular Biology group.

I also examined the prospect of using current methods to identify new multimodal RNAs. I found that there is a class of RNAs (kinetic riboswitches) that is difficult to detect from current sampling methods. I proposed a simple co-transcription simulation method to identify multimodality of such RNAs. The results have been published in this paper.

*Georgia Tech (2018-2019), joint work with Christine Heitsch (Georgia Tech) and Alain Laederach (UNC).*

## multiscale feature detection via distributed topological data analysis

For my PhD dissertation, I worked on applications of topology to data science. I used **cosheaves** and **spectral sequences** to compute **persistence** in a distributed manner. I applied such distributed computation to study **multi-density data** and recovered the information lost in persistence diagrams.

For example, consider the following point cloud and its coresponding persistence diagram in dimension one.

By observing the persistence diagram, one would conclude that there is one significant feature. However, one can see from the point cloud that there are small but significant features that are densely sampled. My construction of distributed computation allows one to identify such significant features that are neglected by traditional methods.

Here is a 30 minute video of my presentation at IMA special workshop on Bridging Statistics and Sheaves.

The paper can be found on arXiv. Here is a copy of my PhD dissertation.

*University of Pennsylvania (2013-2018), PhD dissertation. Joint work with Robert Ghrist (U. Penn).*