• A Comparison of Clustering Algorithms for the Study of Antibody Loop Structures

      Obradovic, Zoran; Dragut, Eduard Constantin; Vucetic, Slobodan; Zeng, Qiang; Dunbrack, Roland L. (Temple University. Libraries, 2017)
      Antibodies are the fundamental agents of the immune system. The CDRs, or Complementarity Determining Regions act as the functional surfaces in binding antibodies to their targets. These CDR structures, which are peptide loops, are diverse in both amino acid sequence and structure. In 2011, we surveyed a database of CDR loop structures using the affinity propagation clustering algorithm of Frey and Dueck. With the growth of the number of structures deposited in the Protein Data Bank, the number of antibody CDRs has approximately tripled. In addition, although the affinity clustering in 2011 was successful in many ways, the methods used left too much noise in the data, and the affinity clustering algorithm tended to clump diverse structures together. This work revisits the antibody CDR clustering problem and uses five different clustering algorithms to categorize the data. Three of the clustering algorithms use DBSCAN but differ in the data comparison functions used. One uses the sum of the dihedral distances, while another uses the supremum of the dihedral distances, and the third uses the Jarvis-Patrick shared nearest neighbor similarity, where the nearest neighbor lists are compiled using the sum of the dihedral distances. The other two clustering methods use the k-medoids algorithm, one of which has been modified to include the use of pairwise constraints. Overall, the DBSCAN using the sum of dihedral distances and the supremum of the dihedral distances produced the best clustering results as measured by the average silhouette coefficient, while the constrained k-medoids clustering algorithm had the worst clustering results overall.