Algorithms for NP-hard Optimization Problems and Cluster Analysis
dc.contributor.advisor | Latecki, Longin | |
dc.creator | Li, Nan | |
dc.date.accessioned | 2020-11-04T16:10:06Z | |
dc.date.available | 2020-11-04T16:10:06Z | |
dc.date.issued | 2017 | |
dc.identifier.uri | http://hdl.handle.net/20.500.12613/3195 | |
dc.description.abstract | The set cover problem, weighted set cover problem, minimum dominating set problem and minimum weighted dominating set problem are all classical NP-hard optimization problems of great importance in both theory and real applications. Since the exact algorithms, which require exhaustive exploration of exponentially many options, are infeasible in practice, approximation algorithms and heuristic algorithms are widely used to find reasonably good solutions in polynomial time. I propose novel algorithms for these problems. My algorithms for the weighted set cover and minimum weighted dominating set problems are based on a three-step strategy. For the weighted set cover problem, in the first step, we reserve the sets indispensable for the optimal solution and reduce the problem size. In the second step, we build a robust solution with a novel greedy heuristic. Sets are iteratively selected according to a measure which integrates the weight, the coverage gain for the current iteration and the global coverage capacity of each set. It favors the sets that have smaller weights and better extend or consolidate the coverage, especially on the items that are contained in less sets. Since the obtained solution tends to have a robust coverage, in the third step, we further improve it by removing the redundant sets in an efficient way. For the minimum weighted dominating set problem, we first reserve the indispensable vertices for the optimal solution. Then we convert it into a weighted set cover problem to solve it. These two algorithms can be used to solve the set cover problem and minimum dominating set problem by simply considering all the sets or vertices as having the same weights. Extensive experimental evaluations on a large number of synthetic and real-world set cover instances and graphs from many domains demonstrate the superiority of my algorithms over state-of-the-art. Cluster analysis is a fundamental problem in data analysis, and has extensive applications in artificial intelligence, statistics and even in social sciences. The goal is to partition the data objects into a set of groups (clusters) such that objects in the same group are similar, while objects in different groups are dissimilar. Most of the existing algorithms for clustering are designed to handle data with only one type of attributes, e.g. continuous, categorical or ordinal. Mixed data clustering has received relatively less attention, despite the fact that data with mixed types of attributes are common in real applications. I propose a novel affinity learning based framework for mixed data clustering, which includes: how to process data with mixed-type attributes, how to learn affinities between data points, and how to exploit the learned affinities for clustering. In the proposed framework, each original data attribute is represented with several abstract objects defined according to the specific data type and values. Each attribute value is transformed into the initial affinities between the data point and the abstract objects of attribute. I refine these affinities and infer the unknown affinities between data points by taking into account the interconnections among the attribute values of all data points. The inferred affinities between data points can be exploited for clustering. Alternatively, the refined affinities between data points and the abstract objects of attributes can be transformed into new data features for clustering. Experimental results on many real world data sets demonstrate that the proposed framework is effective for mixed data clustering. This work was published in our IJCAI 2017 paper Li & Latecki (2017). Clustering aggregation, also known as consensus clustering or clustering ensemble, aims to find a single superior clustering from a number of input clusterings obtained by different algorithms with different parameters. I formulate clustering aggregation as a special instance of the maximum-weight independent set (MWIS) problem. For a given data set, an attributed graph is constructed from the union of the input clusterings. The vertices, which represent the distinct clusters, are weighted by an internal index measuring both cohesion and separation. The edges connect the vertices whose corresponding clusters overlap. Intuitively, an optimal aggregated clustering can be obtained by selecting an optimal subset of non-overlapping clusters partitioning the data set together. I formalize this intuition as the MWIS problem on the attributed graph, i.e., finding the heaviest subset of mutually non-adjacent vertices. This MWIS problem exhibits a special structure. Since the clusters of each input clustering form a partition of the dataset, the vertices corresponding to each clustering form a maximal independent set (MIS) in the attributed graph. I propose a variant of simulated annealing method that takes advantage of this special structure. My algorithm starts from each MIS, which is close to a distinct local optimum of the MWIS problem, and utilizes a local search heuristic to explore its neighborhood in order to find the MWIS. Extensive experiments on many challenging data sets show that both my algorithm for the maximum-weight independent set problem and my approach to the application of clustering aggregation achieve good performance. This work was published in our NIPS 2012 paper Li & Latecki (2012). Some new results were published in our IJCAI 2017 paper Fan et al. (2017). | |
dc.format.extent | 101 pages | |
dc.language.iso | eng | |
dc.publisher | Temple University. Libraries | |
dc.relation.ispartof | Theses and Dissertations | |
dc.rights | IN COPYRIGHT- This Rights Statement can be used for an Item that is in copyright. Using this statement implies that the organization making this Item available has determined that the Item is in copyright and either is the rights-holder, has obtained permission from the rights-holder(s) to make their Work(s) available, or makes the Item available under an exception or limitation to copyright (including Fair Use) that entitles it to make the Item available. | |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | |
dc.subject | Computer Science | |
dc.title | Algorithms for NP-hard Optimization Problems and Cluster Analysis | |
dc.type | Text | |
dc.type.genre | Thesis/Dissertation | |
dc.contributor.committeemember | Ling, Haibin | |
dc.contributor.committeemember | Vucetic, Slobodan | |
dc.contributor.committeemember | Zhang, Yimin | |
dc.description.department | Computer and Information Science | |
dc.relation.doi | http://dx.doi.org/10.34944/dspace/3177 | |
dc.ada.note | For Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu | |
dc.description.degree | Ph.D. | |
refterms.dateFOA | 2020-11-04T16:10:06Z |