Single-Cell Clustering Based on Shared Nearest Neighbor and Graph Partitioning

  • PDF / 1,123,214 Bytes
  • 14 Pages / 595.276 x 790.866 pts Page_size
  • 97 Downloads / 230 Views

DOWNLOAD

REPORT


ORIGINAL RESEARCH ARTICLE

Single‑Cell Clustering Based on Shared Nearest Neighbor and Graph Partitioning Xiaoshu Zhu1,2 · Jie Zhang2 · Yunpei Xu1 · Jianxin Wang1 · Xiaoqing Peng3 · Hong‑Dong Li1  Received: 3 September 2019 / Revised: 23 December 2019 / Accepted: 26 December 2019 © International Association of Scientists in the Interdisciplinary Areas 2020

Abstract Clustering of single-cell RNA sequencing (scRNA-seq) data enables discovering cell subtypes, which is helpful for understanding and analyzing the processes of diseases. Determining the weight of edges is an essential component in graphbased clustering methods. While several graph-based clustering algorithms for scRNA-seq data have been proposed, they are generally based on k-nearest neighbor (KNN) and shared nearest neighbor (SNN) without considering the structure information of graph. Here, to improve the clustering accuracy, we present a novel method for single-cell clustering, called structural shared nearest neighbor-Louvain (SSNN-Louvain), which integrates the structure information of graph and module detection. In SSNN-Louvain, based on the distance between a node and its shared nearest neighbors, the weight of edge is defined by introducing the ratio of the number of the shared nearest neighbors to that of nearest neighbors, thus integrating structure information of the graph. Then, a modified Louvain community detection algorithm is proposed and applied to identify modules in the graph. Essentially, each community represents a subtype of cells. It is worth mentioning that our proposed method integrates the advantages of both SNN graph and community detection without the need for tuning any additional parameter other than the number of neighbors. To test the performance of SSNN-Louvain, we compare it to five existing methods on 16 real datasets, including nonnegative matrix factorization, single-cell interpretation via multi-kernel learning, SNN-Cliq, Seurat and PhenoGraph. The experimental results show that our approach achieves the best average performance in these datasets. Keywords  Single-cell RNA-seq · Similarity · Clustering · Shared nearest neighbor · Louvain community detection

1 Introduction

Electronic supplementary material  The online version of this article (https​://doi.org/10.1007/s1253​9-019-00357​-4) contains supplementary material, which is available to authorized users. * Xiaoqing Peng [email protected] * Hong‑Dong Li [email protected] 1



Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, Hunan, China

2



School of Computer Science and Engineering, Yulin Normal University, Yulin 537000, Guangxi, China

3

School of Life Science, Central South University, Changsha 410083, Hunan, China



Unlike bulk RNA-sequencing that measures the average expression of a cell population and may lose the differential expression information among cells, single-cell RNA sequencing (scRNA-seq) measures individual cell and facilitates understanding cell hetero