Topic Modeling of Palantir Patents and List of Palantir Contracts
dc.coverage.temporal | 2006-2019 | en_US |
dc.creator | Iliadis, Andrew | |
dc.creator | Acker, Amelia | |
dc.date.accessioned | 2021-08-18T20:30:41Z | |
dc.date.available | 2021-08-18T20:30:41Z | |
dc.date.issued | 2021-08-18 | |
dc.identifier.citation | Iliadis, A., & Acker, A. (2021). Topic Modeling of Palantir Patents and List of Palantir Contracts. Temple University. | en_US |
dc.identifier.uri | http://hdl.handle.net/20.500.12613/6805 | |
dc.description | For this study, we scraped all Palantir’s patents that contained the word “ontology” (as of 08/25/20) from Google Patents. This produced a purposive sample (n=155) of Palantir patents, consisting of 5197 pages, over 2.5 million words, and over 18.5 million characters. We then prepared the data set for processing by stripping all the metadata and special features, converting formats, compressing, and collating the patents together. We imported several Python libraries used for data processing (Pandas, Matplotlib, NumPy, and Seaborn), and Google Collaboratory was used to assemble the patent data, which was then loaded in a textual paragraph format. Preprocessing was then carried out, including punctuation, null value, and stop word removal, lemmatization, lowercase conversion, and tokenization, which resulted in a preprocessed data set. Part-of-speech (POS) tagging was performed, and the tokens were targeted in accordance with their corresponding POS based on context and definition (this produced most frequent nouns, verbs, etc.). Next, named entity recognition was performed to locate and classify entities in the text into predefined categories such as persons, organizations, locations, times, quantities, monetary values, percentages, etc. Topic modeling was performed using a bag-of-words model and Latent Dirichlet Allocation. We also downloaded a list of US Government contracts with Palantir from the Federal Procurement Data System (as of 08/04/21). | en_US |
dc.description.abstract | Palantir is one of the most secretive and understudied surveillance firms in the US. The company supplies information technology (IT) solutions to governments, nonprofits, and corporations, focusing on data integration and surveillance services. We begin by sketching Palantir’s company history and contract network, followed by an explanation of key terms associated with Palantir’s area of technology specialization and a description of the firm’s platform ecosystem. We then provide a summary of current scholarship on Palantir’s continuing role in policing, intelligence, and security operations. Our primary contribution and analysis are a computational topic modeling of a purposive sample (n=155) of Palantir’s surveillance patents including their topics and themes. This approach follows recent literature that uses patents as primary data for researching the surveillance imaginaries and capabilities of IT firms. We end by discussing the concept of infrastructuring to understand Palantir as a surveillance platform, where information standards like administrative metadata are theorized as phenomena for structuring entities in and through access to digital information. | en_US |
dc.language | english | en_US |
dc.language.iso | eng | en_US |
dc.rights | Attribution-NonCommercial-ShareAlike CC BY-NC-SA | en_US |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-sa/4.0/ | en_US |
dc.subject | Electronic surveillance | en_US |
dc.title | Topic Modeling of Palantir Patents and List of Palantir Contracts | en_US |
dc.type | Dataset | en_US |
dc.type.genre | Dataset | en_US |
dc.description.department | Department of Media Studies and Production | en_US |
dc.relation.doi | http://dx.doi.org/10.34944/dspace/6787 | |
dc.ada.note | For Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.edu | en_US |
dc.description.schoolcollege | Klein College of Media and Communication | en_US |
dc.description.sponsor | Temple University Grant-in-Aid Award | en_US |
dc.description.softwarecreate | Jupyter Notebook, Excel, Word | en_US |
dc.description.softwareprocess | Python | en_US |
dc.temple.creator | Iliadis, Andrew | |
refterms.dateFOA | 2021-08-18T20:30:41Z |