Show simple item record

dc.coverage.temporal2006-2019en_US
dc.creatorIliadis, Andrew
dc.creatorAcker, Amelia
dc.date.accessioned2021-08-18T20:30:41Z
dc.date.available2021-08-18T20:30:41Z
dc.date.issued2021-08-18
dc.identifier.citationIliadis, A., & Acker, A. (2021). Topic Modeling of Palantir Patents and List of Palantir Contracts. Temple University.en_US
dc.identifier.urihttp://hdl.handle.net/20.500.12613/6805
dc.descriptionFor this study, we scraped all Palantir’s patents that contained the word “ontology” (as of 08/25/20) from Google Patents. This produced a purposive sample (n=155) of Palantir patents, consisting of 5197 pages, over 2.5 million words, and over 18.5 million characters. We then prepared the data set for processing by stripping all the metadata and special features, converting formats, compressing, and collating the patents together. We imported several Python libraries used for data processing (Pandas, Matplotlib, NumPy, and Seaborn), and Google Collaboratory was used to assemble the patent data, which was then loaded in a textual paragraph format. Preprocessing was then carried out, including punctuation, null value, and stop word removal, lemmatization, lowercase conversion, and tokenization, which resulted in a preprocessed data set. Part-of-speech (POS) tagging was performed, and the tokens were targeted in accordance with their corresponding POS based on context and definition (this produced most frequent nouns, verbs, etc.). Next, named entity recognition was performed to locate and classify entities in the text into predefined categories such as persons, organizations, locations, times, quantities, monetary values, percentages, etc. Topic modeling was performed using a bag-of-words model and Latent Dirichlet Allocation. We also downloaded a list of US Government contracts with Palantir from the Federal Procurement Data System (as of 08/04/21).en_US
dc.description.abstractPalantir is one of the most secretive and understudied surveillance firms in the US. The company supplies information technology (IT) solutions to governments, nonprofits, and corporations, focusing on data integration and surveillance services. We begin by sketching Palantir’s company history and contract network, followed by an explanation of key terms associated with Palantir’s area of technology specialization and a description of the firm’s platform ecosystem. We then provide a summary of current scholarship on Palantir’s continuing role in policing, intelligence, and security operations. Our primary contribution and analysis are a computational topic modeling of a purposive sample (n=155) of Palantir’s surveillance patents including their topics and themes. This approach follows recent literature that uses patents as primary data for researching the surveillance imaginaries and capabilities of IT firms. We end by discussing the concept of infrastructuring to understand Palantir as a surveillance platform, where information standards like administrative metadata are theorized as phenomena for structuring entities in and through access to digital information.en_US
dc.languageenglishen_US
dc.language.isoengen_US
dc.rightsAttribution-NonCommercial-ShareAlike CC BY-NC-SAen_US
dc.rights.urihttps://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.subjectElectronic surveillanceen_US
dc.titleTopic Modeling of Palantir Patents and List of Palantir Contractsen_US
dc.typeDataseten_US
dc.type.genreDataseten_US
dc.description.departmentDepartment of Media Studies and Productionen_US
dc.relation.doihttp://dx.doi.org/10.34944/dspace/6787
dc.ada.noteFor Americans with Disabilities Act (ADA) accommodation, including help with reading this content, please contact scholarshare@temple.eduen_US
dc.description.schoolcollegeKlein College of Media and Communicationen_US
dc.description.sponsorTemple University Grant-in-Aid Awarden_US
dc.description.softwarecreateJupyter Notebook, Excel, Worden_US
dc.description.softwareprocessPythonen_US
dc.temple.creatorIliadis, Andrew
refterms.dateFOA2021-08-18T20:30:41Z


Files in this item

Thumbnail
Name:
PalantirTopicModeling.txt
Size:
4.493Mb
Format:
Text file
Description:
Python code for the topic modeling ...
Thumbnail
Name:
PalantirPatentsList.xlsx
Size:
32.15Kb
Format:
Microsoft Excel 2007
Description:
List of all the patents used in ...
Thumbnail
Name:
PalantirPatentsList.csv
Size:
56.70Kb
Format:
Unknown
Description:
CSV version of list of all the ...
Thumbnail
Name:
PalantirUSGovContractsCorpus.pdf
Size:
1.206Mb
Format:
PDF
Description:
Copies of contracts referenced ...
Thumbnail
Name:
PalantirUSGovContractsList.csv
Size:
369.4Kb
Format:
Unknown
Description:
List of all the contracts included ...
Thumbnail
Name:
PalantirPatentsCorpus.pdf
Size:
101.7Mb
Format:
PDF
Description:
Copies of patents used in the ...
Thumbnail
Name:
README_Iliadis_Palantir.txt
Size:
6.279Kb
Format:
Text file
Description:
Readme

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-ShareAlike CC BY-NC-SA
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike CC BY-NC-SA