GSA: a GPU-accelerated structure similarity algorithm and its application in progressive virtual screening

  • PDF / 1,247,976 Bytes
  • 11 Pages / 595.276 x 790.866 pts Page_size
  • 57 Downloads / 174 Views

DOWNLOAD

REPORT


FULL-LENGTH PAPER

GSA: a GPU-accelerated structure similarity algorithm and its application in progressive virtual screening Xin Yan · Qiong Gu · Feng Lu · Jiabo Li · Jun Xu

Received: 20 June 2012 / Accepted: 8 October 2012 © Springer Science+Business Media Dordrecht 2012

Abstract A new algorithm is proposed for accelerating chemical structure similarity search by means of graphic processing unit technology. Experiments demonstrate that the new algorithm is on average more than 120-times faster than the one implemented in conventional central processing unit technology. In order to test the generality of the new algorithm, it has been applied in seven progressive virtual screening experiments on NCI/DTP 60 human cancer cell lines data. The progressive virtual screening results show that the technology can select 10–20 % compounds for screening to get 70–80 % intrinsic hits for a given chemical library and target. Keywords screening

Structure similarity · GPU · Progressive virtual

Introduction In drug discovery process, one of the objectives is to identify chemical structures that substructurally not necessarily equivalent (i.e., substructural match) to a query structure (QG), but they are similar to QG. Practically, for a given QG, all structures in a chemical library have to be searched and compared Electronic supplementary material The online version of this article (doi:10.1007/s11030-012-9403-0) contains supplementary material, which is available to authorized users. X. Yan · Q. Gu · F. Lu · J. Xu (B) School of Pharmaceutical Sciences & Institute of HumanVirology, Sun Yat-sen University, 132 East Circle at University City, Guangzhou 510006, China e-mail: [email protected] J. Li Accelrys Inc., 10188 Telesis Ct# 100, San Diego, CA 92121-4779, USA

with the QG, and their structural similarities to QG have to be calculated. The final hits are determined by a similarity threshold. Substructure search can be accelerated by means of structure screens by reducing the number of atom-by-atom comparisons. Similarity search, however, have to compare every chemical structure with QG. Therefore, structure similarity search is slow for a large chemical compound library. In a ligand-based virtual drug screening, compounds in a chemical library are ranked based on the similarities to a known compound or compounds. The top-ranked compounds have higher priorities to be screened for a biological activity [1]. Many similarity search algorithms have been reported [2–7], and the most common similarity measurements involve the use of 2D fingerprints and Tanimoto coefficient [1]. A fingerprint is usually a binary or integer vector where each component represents the presence/absence or the number of occurrences of a structural fragment/feature in a given molecular structure. There are two main classes of fingerprint [1]: the dictionary-based approach, and the moleculebased approach. The former involves a pre-defined list of fragments; a molecule is checked for the presence or the number of occurrences of each of the fragments