Linked Data for Digital History Connecting Data for Research Victor de Boer With input from Christophe Guéret, Serge ter Braake, Niels Ockeloen, Antske Fokkens, Dirk Roorda, Lora Aroyo, Johan Oomen, Oana Inel, Jan Wielemaker, Jeroen Entjes Victor de Boer Web & Media Group, CS, Vrije Universiteit Amsterdam Netherlands Institute for Sound and Vision Cultural Heritage Digital History Linked Data for Development Digital History Sub-discipline of digital humanities Part of the effort of historian is moved from the physical archives to digital ones Cross-domain collaboration Img:www.doaks.org, www.dkrz.de Tools and visualisations http://armstrongdigitalhistory.org/, http://www.vcdh.virginia.edu/courses/fall07/hius401-f/, http:// digitalhistory.unl.edu/essays/thomasessay.php, http://www.philipvickersfithian.com/2013/05/gender-in-stacks-on-managing-small.html “That is great. I would love that… …but my research questions are slightly different.” Img:Monty Python Aging Data Tool C. Guéret based on http://redmonk.com/jgovernor/2007/04/05/why-applciations-are-like-fish-and-data-is-like0wine/ Even better Do not bake the data into the tool and treat data as an end product. Build tools on top of the data. Make sure others can do so as well. Fig: C. Guéret Linked Data for Digital History • Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF) – Link what can be linked • re-use and re-usability • Linked Data is the (technically) best way to publish and share your (research) data OBJECT EVENT PLACE TIME PERSON CONCEPT PROVENANCE Some examples Dutch Ships and Sailors The Problem: ((Maritime) historical) data is not integrated KB NEWSPAPERS Dutch-Asiatic Shipping “VOC Opvarenden” Jur Leinenga Matthias van Rossum Elbing voyagesArchangel voyages DIFFERENT but LINKED DATAMODELS BASED ON COMPETENCY QUESTIONS dss:Record gzmvoc:Telling gzmvoc:telling-1046-De_Berkel __bnode_1 gzmvoc:aziatischeBemanning dss:Ship gzmvoc:Schip gzmvoc: schip-1046-De_Berkel dss:has_ship gzmvoc:schip "1046" “Schip” “De Berkel” rdfs:label dss:scheepsnaam gzmvoc:scheepsnaam dss:ShipType gzmvoc:Scheepstype gzmvoc: type-Shipdss:has_shiptypegzmvoc:has_shiptype gzmvoc:scheepstype “21” “Moorse mattroosen”dss:azRegistratieKop gzmvoc:azAantalMatrozen gzmvoc:telling gzmvoc:heeft DAS heenreis dss:Record das:Voyage das:voyage-1918_61 ACCESS IT AT HTTP://DUTCHSHIPSANDSAILORS.NL/DATA OR HTTP://SEMANTICWEB.CS.VU.NL/DSS SELECT * WHERE { ?record dss:hasOriginalScan ?scan. ?record dss:has_kb_link ?kblink. ?record mdb:schip ?schip. ?schip mdb:scheepstype ?shiptype. ?shiptype skos:exactMatch ?em. ?em skos:broader* aat:kustvaarders. } Data analysis and visualisation DIVE MEDIA HISTORIANS AND RESEARCHERS Media rese archer Lar s Arve Røs sland of th e Universi ty of Berge n. (Photo: Andreas R . Graven) EXPLORATIVE SEARCH Digital Hermeneutics: The combination of digital (Web) technology and theory of interpretation DATA: OPENIMAGES.EU and DELPHER.NL ENTITY EXTRACTION CROWDTRUTH.ORG ENTITY EXTRACTION EVENTS CROWDSOURCING AND LINKING TO CONCEPTS THROUGH CROWDTRUTH.ORG SEGMENTATION & KEYFRAMES LINKING EVENTS AND CONCEPTS TO KEYFRAMES DATA CONNECTED IN KNOWLEDGE GRAPH DIVE:MEDIA OBJECT SEM:EVEN T SEM:PLACE SEM:TIME SEM:ACTOR SKOS:CONCEPT OA:ANNOTATIO N LINKS TO EUROPEANA LINKS TO DBPEDIA “DIGITAL SUBMARINE” INTERFACE DIVE.BEELDENGELUID.NL BiographyNet Starting Point: Biography Portal of the Netherlands; www.biografischportaal.nl 125,000 short biographical descriptions with limited metadata from 23 Dutch biographical dictionaries (~76,000 individuals) What kind of historical questions can be answered with these data with the help of computational methods Biographynet.nl Johan Rudolph Thorbecke werd in 1798 geboren op 14 januari in Zwolle en komt uit een half-Duitse… 14 januari Linked Data for BiograpyNet Thorbecke Biographical Description Provenance Meta Data NNBW Person Meta Data “Thorbecke” Biography Parts Birth 1798Event Biographical Description Enrichment NLP Tool Person Meta Data Event Birth l i lf- i Zwolle1798-01-14 Biographynet.nl aProvenance in Biographynet Ensure credibility of the demonstrator, to evaluate its performance and to improve the academic status of the tool Information involved  Sources, but also: NER input data, etc. Processes involved  All steps in enrichment, aggregation… People involved  Who was responsible for pipeline, tool, Biographynet.nl*Daniel Garijo, Yolanda Gil; http://www.opmw.org/model/p-plan Interface for historians Biographynet.nl Framework generic solutions with historians 1. Preprocess, Clean, Model, Link, Enrich data in a collaboration with domain experts 2. Access heterogeneous datasets in a convenient way to get an intuition of the character and anomalies of the (linked) data; 3. Perform arbitrary queries to retrieve results relevant to their research questions; 4. Verify the veracity of query results, by following provenance links to original material 5. Retrieve and analyze the data with tool of preference. 6. Republish and share results Historical tool criticism … willingness from historians to invest the time to learn about computer processes (at least the basic principles) Possibilities for education at universities to bridge the gap between computer science and humanities studies and make tool criticism an integral part of student’s curricula “Why do we still teach history student to decipher 17th Century handwriting, but not SQL” Thank you! Victor de Boer http://victordeboer.com v.de.boer@vu.nl @victordeboer Verrijkt Koninkrijk 30 National-Socialist; 29% Social-Democrat; 21%Protestant; 13% Liberal; 12% Roman-Catholic; 12% Communist; 8% Jewish; 5% http://semanticweb.cs.vu.nl/verrijktkoninkrijk/ http://search.loedejongdigitaal.nl/ Results are links to paragraphs re-usability http://qhp.science.uva.nl/