Improving the t-SNE Algorithms for Cytometry and Other Technologies: Cen-Se' Mapping

Research Article

Charles Bruce Bagwell, Christ


SNE methods are a set of 9 to 10 interconnected algorithms that map high-dimensional data into low-dimensional space while minimizing loss of information. Each step in this process is important for producing high-quality maps. Cense′™ mapping not only enhances many of the steps in this process but also fundamentally changes the underlying mathematics to produce high-quality maps. The key mathematical enhancement is to leverage the Cauchy distribution for creating both high-dimensional and lowdimensional similarity matrices. This simple change eliminates the necessity of using perplexity and entropy and results in maps that optimally separate clusters defined in high-dimensional space. It also eliminates the loss of cluster resolution commonly seen with t-SNE with higher numbers of events. There is just one free parameter for Cen-se′ mapping, and that parameter rarely needs to change. Other enhancements include a relatively low memory footprint, highly threaded implementation, and a final classification step that can process millions of events in seconds. When the Cen-se′ mapping system is integrated with probability state modeling, the clusters of events are positioned in a reproducible manner and are colored, labeled, and enumerated automatically. We provide a step-by-step, simple example that describes how the Cen-se′ method works and differs from the t-SNE method. We present data from several experiments to compare the two mapping strategies on high-dimensional mass cytometry data. We provide a section on information theory to explain how the steepest gradient equations were formulated and how they control the movement of the low-dimensional points as the system renders the map Since existing implementations of the t-SNE algorithm can easily be modified with many of these enhancements, this work should result in more effective use of this very exciting and far-reaching new technology.

Relevant Publications in Biometrics & Biostatistics