A Tool for Analyzing High Dimension Data Using oneAPI AI Analytics Toolkits

The topic is about implementation of oneAPI analytics toolkit in Medical Science.We will be exploring single cell data (eg:- scRNA sequence).

We will be porting Clustergrammer2 to AI analytics toolkit. Clustergrammer2  produces highly interactive visualizations that enable intuitive exploration of high-dimensional data and has several optional biology-specific features (e.g. enrichment analysis; see Biology-Specific Features) to facilitate the exploration of gene-level biological data. It is a web base tool for visualizing and analysing high dimensional data (eg single cell RNA sequence) as interactive and shareable heatmaps.

We will be exploring gene expression data that has got very good implementation I terms of studying diseases such as cancer. As we explore heatmaps the information we get is very useful for studying where gene mutation has occurred. Porting Clustergrammer 2 to AI Analytics toolkit gives us an edge of exploring data interactively of 2700 PBMC’s(Peripheral blood mono nuclear cell)obtained from 10X GENOMICS(dataset). We will be using Intel Optimized Python from AI analytics toolkit and run the programs in Intel DevCloud

We will also use an external dataset for exploration known as CIBERSORT(This dataset provides an es timation of abundances of number of cell types in a mixed population using gene expression data.We will be loading the data as a Sparse matrix format. The dataset consists of 32 thousand genes and 2700 single cells.

Using Intel Optimized python we will normalize the dataset(i.e gene expression data GEX data) and find top expressing genes. Then we will implement ArcSinh transform and Z-Score. After that we load the data into CLusterGrammer2 that we ported for AI Analytics toolkit. We observe interactive heatmaps.

Here are the features of ClusterGrammer2

  • Zooming and Panning
  • Allows users to zoom into and pan across their heatmap by scrolling and dragging
  • MouseOver Interations
  • Mousing over elements in the heatmap brings up additional information using tooltips.
  • Row and column reordering
  • Interactive Dimensionality reduction
  • Dimensional reduction is useful data analysis technique that is often used to reduce the dimensionality of high dimensional datasets down to number that can be visualized.
  • Interactive Dendogram
  • Clustergrams typically have dendrogram trees (for both rows and columns) to depict the hierarchy of row and column clusters produced by hierarchical clustering. The height of the branches in the dendrogram depict the distance between clusters. Clustergrammer depicts this hierarchical tree one slice at a time using trapezoids.

Uses

  • We can visualize bulk gene expression data
  • Cancer Cell line Encyclopedia Gene expression Data
  • Lung Cancer Post-Translational modification and Gene Expression Regulation
×


Register for the oneAPI DevSummit hosted by UXL:

Register Now