Loading...
Thumbnail Image
Item

OPTIMAL SUBSEQUENCE BIJECTION AND CLASSIFICATION OF IMBALANCED DATA SETS

Koknar-Tezel, Suzan
Citations
Altmetric:
Genre
Thesis/Dissertation
Date
2011
Group
Department
Computer and Information Science
Permanent link to this record
Research Projects
Organizational Units
Journal Issue
DOI
http://dx.doi.org/10.34944/dspace/1632
Abstract
Time series are common in many research fields. Since both a query and a target sequence may be noisy, i.e., contain some outlier elements, it is desirable to exclude the outlier elements from matching in order to obtain a robust matching performance. Moreover, in many applications like shape alignment or stereo correspondence it is also desirable to have a one-to-one and onto correspondence (a bijection) between the remaining elements. To address the problem of noisy time series data we propose using an algorithm that determines the optimal subsequence bijection (OSB) of a query and target time series. The OSB is efficiently computed since the problem’s solution is mapped to a cheapest path in a DAG (directed acyclic graph). We make several significant improvements to the original OSB algorithm and show that these improvements are theoretically and experimentally justified. We compare OSB to standard and state of the art distance measures such as Euclidean distance, Dynamic Time Warping with and without warping window, Longest Common Subsequence, Edit Distance with Real Penalty, and Time Warp Edit Distance. Moreover, we show that OSB is particularly suitable for partial matching. In addition to noisy data, imbalanced time series data sets present a particular challenge to the data mining community. Often, it is the rare event that is of interest and the cost of misclassifying the rare event is higher than misclassifying the usual event. When the data is highly skewed toward the usual, it can be very difficult for a learning system to accurately detect the rare event. There have been many approaches in recent years for handling imbalanced data sets, from under-sampling the majority class to adding synthetic points to the minority class in feature space. To address the problem of imbalanced data sets, we present an innovative approach to adding synthetic points (ghost points) to the minority class in distance space and theoretically show that these points preserve the distances. All current methods that add synthetic points to minority classes do so in feature space. However, distances between time series are known to be non-Euclidean and non-metric, since comparing time series requires warping in time. In addition, in some fields data is not available as feature vectors, but instead as pairwise distances between objects in the data set. Therefore the only recourse to augmenting the minority class is to add synthetic points in distance space. Our experimental results on standard time series using standard distance measures show that our synthetic points significantly improve the classification rate of the rare events, and in most cases also improves the overall accuracy of support vector machines. We also show how adding our synthetic points can aid in the visualization of time series data sets. For time series classification, a large number of similarity approaches have been developed, with the main focus being the comparison or matching of pairs of time series. In these approaches, other time series do not influence the similarity measure of a given pair of time series. By using the locally constrained diffusion process (LCDP), other time series do influence the similarity measure of each pair of time series, and we show that this influence is beneficial. The influence of other time series is propagated as a diffusion process on a graph formed by a given set of time series. We use LCDP when densifying the minority class data space by adding ghost points. Our experimental results demonstrate that using LCDP when densifying the minority class also improves the classification rate of the minority class.
Description
Citation
Citation to related work
Has part
ADA compliance
For Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
Embedded videos