• Login
    View Item 
    •   Home
    • Theses and Dissertations
    • Theses and Dissertations
    • View Item
    •   Home
    • Theses and Dissertations
    • Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of TUScholarShareCommunitiesDateAuthorsTitlesSubjectsGenresThis CollectionDateAuthorsTitlesSubjectsGenres

    My Account

    LoginRegister

    Help

    AboutPeoplePoliciesHelp for DepositorsData DepositFAQs

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    OPTIMAL SUBSEQUENCE BIJECTION AND CLASSIFICATION OF IMBALANCED DATA SETS

    • CSV
    • RefMan
    • EndNote
    • BibTex
    • RefWorks
    Thumbnail
    Name:
    KoknarTezel_temple_0225E_10565.pdf
    Size:
    2.465Mb
    Format:
    PDF
    Download
    Genre
    Thesis/Dissertation
    Date
    2011
    Author
    Koknar-Tezel, Suzan
    Advisor
    Latecki, Longin
    Committee member
    Yates, Alexander
    Ling, Haibin
    Hodgson, J. P. E. (Jonathan P. E.), 1942-
    Department
    Computer and Information Science
    Subject
    Computer Science
    Permanent link to this record
    http://hdl.handle.net/20.500.12613/1650
    
    Metadata
    Show full item record
    DOI
    http://dx.doi.org/10.34944/dspace/1632
    Abstract
    Time series are common in many research fields. Since both a query and a target sequence may be noisy, i.e., contain some outlier elements, it is desirable to exclude the outlier elements from matching in order to obtain a robust matching performance. Moreover, in many applications like shape alignment or stereo correspondence it is also desirable to have a one-to-one and onto correspondence (a bijection) between the remaining elements. To address the problem of noisy time series data we propose using an algorithm that determines the optimal subsequence bijection (OSB) of a query and target time series. The OSB is efficiently computed since the problem’s solution is mapped to a cheapest path in a DAG (directed acyclic graph). We make several significant improvements to the original OSB algorithm and show that these improvements are theoretically and experimentally justified. We compare OSB to standard and state of the art distance measures such as Euclidean distance, Dynamic Time Warping with and without warping window, Longest Common Subsequence, Edit Distance with Real Penalty, and Time Warp Edit Distance. Moreover, we show that OSB is particularly suitable for partial matching. In addition to noisy data, imbalanced time series data sets present a particular challenge to the data mining community. Often, it is the rare event that is of interest and the cost of misclassifying the rare event is higher than misclassifying the usual event. When the data is highly skewed toward the usual, it can be very difficult for a learning system to accurately detect the rare event. There have been many approaches in recent years for handling imbalanced data sets, from under-sampling the majority class to adding synthetic points to the minority class in feature space. To address the problem of imbalanced data sets, we present an innovative approach to adding synthetic points (ghost points) to the minority class in distance space and theoretically show that these points preserve the distances. All current methods that add synthetic points to minority classes do so in feature space. However, distances between time series are known to be non-Euclidean and non-metric, since comparing time series requires warping in time. In addition, in some fields data is not available as feature vectors, but instead as pairwise distances between objects in the data set. Therefore the only recourse to augmenting the minority class is to add synthetic points in distance space. Our experimental results on standard time series using standard distance measures show that our synthetic points significantly improve the classification rate of the rare events, and in most cases also improves the overall accuracy of support vector machines. We also show how adding our synthetic points can aid in the visualization of time series data sets. For time series classification, a large number of similarity approaches have been developed, with the main focus being the comparison or matching of pairs of time series. In these approaches, other time series do not influence the similarity measure of a given pair of time series. By using the locally constrained diffusion process (LCDP), other time series do influence the similarity measure of each pair of time series, and we show that this influence is beneficial. The influence of other time series is propagated as a diffusion process on a graph formed by a given set of time series. We use LCDP when densifying the minority class data space by adding ghost points. Our experimental results demonstrate that using LCDP when densifying the minority class also improves the classification rate of the minority class.
    ADA compliance
    For Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
    Collections
    Theses and Dissertations

    entitlement

     
    DSpace software (copyright © 2002 - 2023)  DuraSpace
    Temple University Libraries | 1900 N. 13th Street | Philadelphia, PA 19122
    (215) 204-8212 | scholarshare@temple.edu
    Open Repository is a service operated by 
    Atmire NV
     

    Export search results

    The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

    By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.

    To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

    After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.