• Login
    View Item 
    •   Home
    • Theses and Dissertations
    • Theses and Dissertations
    • View Item
    •   Home
    • Theses and Dissertations
    • Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of TUScholarShareCommunitiesDateAuthorsTitlesSubjectsGenresThis CollectionDateAuthorsTitlesSubjectsGenres

    My Account

    LoginRegister

    Help

    AboutPeoplePoliciesHelp for DepositorsData DepositFAQs

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Exploration of 3D Images to Understand 3D Real World

    • CSV
    • RefMan
    • EndNote
    • BibTex
    • RefWorks
    Thumbnail
    Name:
    TETDEDXLi-temple-0225E-12694.pdf
    Size:
    19.25Mb
    Format:
    PDF
    Download
    Genre
    Thesis/Dissertation
    Date
    2016
    Author
    Li, Peiyi
    Advisor
    Ling, Haibin
    Committee member
    Shi, Justin Y.
    Vucetic, Slobodan
    Zheng, Yefeng, 1975-
    Department
    Computer and Information Science
    Subject
    Computer Science
    Artificial Intelligence
    Computer Vision
    Deep Learning
    Depth Map
    Machine Learning
    Permanent link to this record
    http://hdl.handle.net/20.500.12613/3194
    
    Metadata
    Show full item record
    DOI
    http://dx.doi.org/10.34944/dspace/3176
    Abstract
    Our world is composed of 3-dimension objects. Every one of us is living in a world with X, Y and Z axis. Even though the way we record our world is usually taking a photo: reduce dimensionality from 3-dimension to 2-dimension, the most natural and vivid way to understand the world, and to interact with it, is to sense from our 3D real world. We human beings are sensoring our 3D real world everyday using our build-in stereo system: two eyes. In another word, the raw source data human beings obtain to recognize the real 3D world has depth information. It is not difficult to figure out: Will it help if we give machines depth map of a scene during understanding the 3D real world using computer vision technologies? The answer is yes. Following this concept, my research work is focused on 3D topics in Computer Vision. 3-dimension world is the most intuitive and vivid world human beings can perceive. In the past, it is very costly to get 3D raw source data. However, things have changed since the release of many 3D sensors in recent decades. With the help of many modern 3D sensor, I am motivated to choose my research topics among this direction. Nowadays, 3D sensor has been used in various aspects of industries. In gaming industry, we have many kinds of commercial in-door 3D sensors. This kind of sensors can generate 3D cloud points in in-door environment with very cheap cost. Thus, provides depth information to traditional computer vision algorithms, and achieves state-of-the-art detection results of human body skeleton. 3D sensor in gaming brings out new ways to interact with computers. In medical industry, engineers offer cone beam computed tomography (CBCT). The raw source data this technology provides gives doctors the idea of holographic structure of target soft/hard tissue. By extend pattern recognition algorithms from 2D to 3D, computer vision scientists can now suggest doctors with 3D texture feature, and help them when diagnose. My research works are along these two lines. In medical image, by looking into trabecular bone 3D structures, I want to use Computer Vision tools to interpret the most tiny density change. In human-computer-interaction task, by studying the 3D point cloud, I want to find a way to estimate human hand pose. First of all, in Medical Image, by using Computer Vision methods, I want to find out a useful algorithm to distinguish bone texture patterns. This task is critical in clinical diagnosis. Variations in trabecular bone texture are known to be correlated with bone diseases, such as osteoporosis. In my research work, we propose a multi-feature multi-ROI (MFMR) approach for analyzing trabecular patterns inside the oral cavity using cone beam computed tomography (CBCT) volumes. For each dental CBCT volume, a set of features including fractal dimension, multi-fractal spectrum and gradient based features are extracted from eight regions-of-interest (ROI) to address the low image quality of trabecular patterns. Then, we use generalized multi-kernel learning (GMKL) to effectively fuse these features for distinguishing trabecular patterns from different groups. To validate the proposed method, we apply it to distinguish trabecular patterns from different gender-age groups. On a dataset containing dental CBCT volumes from 96 subjects, divided into gender-age subgroups, our approach achieves 96.1\% average classification rate, which greatly outperforms approaches without the feature fusion. Besides, in human-computer-interaction task, the most natural way is to use your hand pointing things, or use a gesture to express your ideas. I am motivated to estimate all skeleton joint locations in 3D space, which is the foundation of all gesture understanding. Through logical decision on these skeleton join locations, we can obtain the Semantics behind the hand pose gesture. So, the task is to estimate a hand pose in 3D space, locating all skeletal joints. A real-time 3D hand pose estimation algorithm is then proposed using the randomized decision forest framework. The algorithm takes a depth image as input and generates a set of skeletal joints as output. Previous decision-forest-based methods often give labels to all points in a point cloud at a very early stage and vote for the joint locations. By contrast, this algorithm only tracks a set of more flexible virtual landmark points, named segmentation index points (SIPs), before reaching the final decision at a leaf node. Roughly speaking, an SIP represents the centroid of a subset of skeletal joints, which are to be located at the leaves of the branch expanded from the SIP. Inspired by a latent regression-forest-based hand pose estimation framework, we integrate SIP into the framework with several important improvements. The experimental results on public benchmark datasets show clearly the advantage of the proposed algorithm over previous state-of-the-art methods, and the algorithm runs at 55.5 fps on a normal CPU without parallelism. After the study on RGBD (RGB-depth) images, we have come to another issue. When we want to take advantage of our algorithms, and make an application, we find it really hard to accomplish. The majority of devices today are equipped with RGB cameras. Smart devices in recent years rarely have RGBD cameras on them. We have come to a dilemma that we are not able to apply our algorithms to more general scenarios. So I have changed my perspective to try some 3D reconstruction algorithms on ordinary RGB cameras. As a result, we shift our attention to human face analysis in RGB images. Detection faces in photos are critical in intelligent applications. However, this is far from enough for modern application scenarios. Many applications require accurate localization of facial landmarks. Face Alignment (FA) is critical for face analysis, it has been studied extensively in recently years. For academia, research work among this line is challenging when face images have extreme poses, lighting, expressions, and occlusions etc. Besides, FA is also a fundamental component in all face analysis algorithms. For industry, once having these facial key point locations, many impossible applications becomes reachable. A robust FA algorithm is in great demand. We developed our proposed Convolutional Neural Networks (CNN) on Deep Learning framework Caffe while employing a GPU server of 8 NVIDIA TitanX GPUs. Once finalized the CNN structure, thousands of human-labeled face image data are used to train the proposed CNN on a GPU server cluster with 2 nodes connected by Infinite Band. Each node has 4 NVIDIA K-40 GPU on its own. Our framework outperforms deep learning state-of-the-art algorithms.
    ADA compliance
    For Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
    Collections
    Theses and Dissertations

    entitlement

     
    DSpace software (copyright © 2002 - 2023)  DuraSpace
    Temple University Libraries | 1900 N. 13th Street | Philadelphia, PA 19122
    (215) 204-8212 | scholarshare@temple.edu
    Open Repository is a service operated by 
    Atmire NV
     

    Export search results

    The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

    By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.

    To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

    After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.