Loading...
Citations
Altmetric:
Genre
Thesis/Dissertation
Date
2025-05
Advisor
Committee member
Group
Department
Computer and Information Science
Permanent link to this record
Collections
Research Projects
Organizational Units
Journal Issue
DOI
https://doi.org/10.34944/0d4s-kd84
Abstract
The exponential growth of scientific literature, with millions of new articles published annually, has created an unsustainable discovery bottleneck across research communities. Manual extraction of critical information---including methodologies, datasets, and domain-specific terminologies---now consumes a substantial proportion of researchers' literature review time, particularly impacting time-sensitive fields like climate science and biomedical research where delayed insights hinder urgent policy decisions or therapeutic developments. Automated information extraction systems have transitioned from supplemental tools to essential infrastructure, addressing three critical imperatives: preserving collective understanding through cross-publication discovery linking, enabling real-time knowledge synthesis in rapidly evolving domains, and democratizing access to specialized findings via structured knowledge representation. Without robust frameworks, the scientific community risks perpetuating redundant investigations, overlooking critical interdisciplinary connections, and failing to transform publication volume into actionable insight networks. Current information extraction paradigms face four fundamental technical challenges rooted in scientific communication's unique characteristics. First, terminological instability arises from continuous conceptual evolution, where emerging constructs like ``attribution-based climate models'' and ``GPT-4.5'' outpace standardized taxonomies, generating persistent errors in entity disambiguation.
Second, structural heterogeneity manifests through hundreds of distinct methodological description formats observed even within focused disciplines like materials science, complicating pattern generalization.
Third, contextual dependency demands adaptive interpretation of concepts such as ``deep learning,'' whose technical meanings diverge fundamentally between protein folding architectures and geospatial mapping applications.
Fourth, the scalability--accuracy tradeoff forces untenable compromises between precision (evidenced by frequent LLM hallucinations) and coverage (marked by traditional NLP's oversight of domain-specific abbreviations). These technical barriers compound with systemic data limitations---existing corpora cover only a small fraction of specialized domains while exhibiting annotation inconsistencies that undermine model reliability. Emerging paradigms in hierarchical relationship modeling hint at potential resolutions through hybrid neural-symbolic architectures.
This research advances scientific information extraction through three interconnected contributions: the development of domain-annotated corpora spanning climate science and computer science; systematic evaluation of machine learning architectures across extraction tasks and disciplinary contexts; and demonstrated pathways for transforming extracted entities into evolvable knowledge graphs. By creating structured repositories that capture methodological lineages, dataset dependencies, and conceptual evolution patterns, our work provides researchers with interoperable frameworks for mapping relationships across fragmented scientific domains. The resulting infrastructure enables both precision-focused analysis within specialized fields and cross-domain knowledge discovery, offering scalable solutions to organize literature at scale while preserving disciplinary nuance. These contributions collectively address the dual challenges of maintaining taxonomic rigor and enabling adaptive knowledge synthesis in modern scientific communication ecosystems.
Description
Citation
Citation to related work
Has part
ADA compliance
For Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu
