Generating Linked Open Data from heterogeneous Open Data sources with Anastasia Dimou anastasia.dimou@ugent.be @natadimou Ghent University – iMinds – Multimedia Lab rml.io Semantic Web enabled applications rely on data represented as Linked Open Data semantically annotated using ontologies and vocabularies Most of the data that we would like to be able to query as Linked Open Data exists in formats other than RDF 12% of webpages contain any structured data such as microformat, microdata and RDFa derived from 6% of all websites There are… over 11,000 APIs according to ProgrammableWeb.org only 74 return results in RDF But more than 5000 return results in JSON or XML LOD enables SW apps SW apps demands LOD Semantic Web Linked Open Data Open Data Many languages, tools and approaches were proposed to convert data from different data sources to RDF Existing mapping solutions map per-format and per-source  focus more on handling the source rather than modeling the domain OR provide case-specific solutions  better model the domain R2RML mappings R2RML processor Data OWNER / PUBLISHER defines RDF DB CSV JSON XML RDF RDF RDF The mappings are… independently defined disregarding possible prior definitions links to other resources (re)using same ontologies for similar data manually aligned/interlinked by reconstructing the same URIs by post-mapping interlinking A well-considered policy is required when mapping data to RDF in the context of a certain knowledge domain that shifts the focus FROM modeling the data of a source TO modeling the domain-level knowledge using the available data source(s) uniform mapping definitions to describe mapping rules for heterogeneous sources interoperable mapping definitions that would allow the re-use of mapping rules across different implementations reusable mapping definitions that would allow the re-use of mapping rules for representing data in the same or different formats R2RML mappings R2RML processor Data OWNER / PUBLISHER defines RDF DB CSV JSON XML RDF RDF RDF Mappings definitions processor Data OWNER / PUBLISHER defines RDF DB CSV JSON XML RDF Mapping Language (RML) generic scalable mapping language for mapping heterogeneous data into RDF in an integrable and interoperable fashion superset of the W3C recommended R2RML mapping language http://rml.io RDF Mapping Language RML.io RDF Mapping Language (RML) Source Triples Map Logical Source Subject Map Predicate-Object Map Predicate Map Object Map Term Map template constant reference Iterator Reference Formulation Referencing Object Map Triples Map Join Condition Parent column Child column RDF Mapping Language RML RML generating triples RML reusing mappings RML aligning & interlinking RDF Mapping Language (RML) a ex:Person . a ex:Person . RDF Mapping Language (RML) Triples Map Subject Map Predicate-Object Map Predicate Map Object Map Term Map template constant reference RDF Term : a URI a literal a blank node RML subject Triples Map Subject Map NAME BIRTH_DATE DEATH_DATE Robert Theodore McCall 1919-12-23 2010-02-26 Ronald Anderson 1929-12-06 <#ArtistMapping> rr:subjectMap [ rr:template “http://ex.com/{NAME}” ; rr:class ex:Person ]; a ex:Person RML predicate & object Predicate Map NAME BIRTH_DATE DEATH_DATE Robert Theodore McCall 1919-12-23 2010-02-26 Ronald Anderson 1929-12-06 <#ArtistMapping> rr:predicateObjectMap [ rr:predicate ex:birth_date; rr:objectMap [ rr:column "BIRTH_DATE" ] ]; ex:birth_date “1919-12-23” Predicate Object Map Object Map Triples Map [ ... … { "Title": "Apollo 11 Crew", "Artist": "Ronald Anderson", "Ref": "NPG_70_36", "Sitter": [ { "Name": "Neil Armstrong", "Birth Date": "1930-08-05" }, { "Name": "Buzz Aldrin", "Birth Date": "1930-01-20" }, { "Name": "Michael Collins" } ], "DateOfWork": "1969" }, { "Title": "Neil Armstrong", "Artist": "Robert Theodore McCall", "Ref": "S_NPG_2010_51", "Sitter": [ { "Name": "Neil Armstrong" } ], "DateOfWork": "2009" }, ... … ] ... ... Robert Theodore McCall 1919-12-23 2010-02-26 Ronald Anderson 1929-12-06 ... ... artworks.JSON artists.XML <#ArtworkMapping> rml:logicalSource [ rml:source “http://ex.com/artworks.json” ; rml:referenceFormulation ql:JSONPath ]. Triples Map Logical Source source <#ArtistMapping> rml:logicalSource [ rml:source “artists.xml”; rml:referenceFormulation ql:XPath ]. Reference Formulation Triples Map Logical Source source Reference Formulation [ ... … { "Title": "Apollo 11 Crew", "Artist": "Ronald Anderson", "Ref": "NPG_70_36", "Sitter": [ { "Name": "Neil Armstrong", "Birth Date": "1930-08-05" }, { "Name": "Buzz Aldrin", "Birth Date": "1930-01-20" }, { "Name": "Michael Collins" } ], "DateOfWork": "1969" }, { "Title": "Neil Armstrong", "Artist": "Robert Theodore McCall", "Ref": "S_NPG_2010_51", "Sitter": [ { "Name": "Neil Armstrong" } ], "DateOfWork": "2009" }, ... … ] <#ArtworkMapping> rml:logicalSource [ rml:source “http://ex.com/artworks.json” ; rml:rererenceFormulation ql:JSONPath ; rml:iterator “$.[*]” ]. <#SitterMapping> rml:logicalSource [ rml:source “http://ex.com/artworks.json”; rml:rererenceFormulation ql:JSONPath ; rml:iterator “$.[*].Sitter” ]. <#ArtistMapping> rml:logicalSource [ rml:source “http://ex.com/artists.xml”; rml:rererenceFormulation ql:XPath ; rml:iterator “/Artists/Artist” ] ; rr:subjectMap [ rr:template “http://ex.com/{Name}” ]; rr:predicateObjectMap [ rr:predicate ex:death_date ; rr:objectMap [ rml:reference “/Artists/Artist/Death_Date”] ]. ... ... Robert Theodore McCall 1919-12-23 2010-02-26 Ronald Anderson 1929-12-06 ... ... ex:death_date “1929-12-06”. RDF Mapping Language RML RML generating triples RML reusing mappings RML aligning & interlinking Avoid… redefining and replicating URI patterns remodeling the same domain Uniquely define the URI patterns that generates a resource and refer to its definition { ... "Performance" : { "Perf_ID": "567", "Location": { "lat": "51.043611" , "long": "3.717222” } }, ... } ... 51.076891 3.717222 ... <#PerformancesMapping> rr:subjectMap [ rr:template “http://ex.com/{Perf_ID}”]; rr:predicateObjectMap [ rr:predicate ex:location; rr:objectMap [ rr:parentTriplesMap <#LocationMapping> ] ]. <#EventsMapping> rr:subjectMap [ rr:template "http://ex.com/{@id}" ]; rr:predicateObjectMap [ rr:predicate ex:location; rr:objectMap [ rr:parentTriplesMap <#LocationMapping>]]. { ... "Performance" : { "Perf_ID": "567", "Location": { "lat": "51.043611" , "long": "3.717222” } }, ... } ... 51.076891 3.717222 ... ... <#LocationMapping> rr:subjectMap [ rr:template "http://ex.com/{lat},{long}"]; rr:predicateObjectMap [ rr:predicate ex:long; rr:objectMap [ rml:reference "long" ] ]; rr:predicateObjectMap [ rr:predicate ex:lat; rr:objectMap [ rml:reference "lat" ] ] . ex:51.043611, 3.717222 ex:lat “3.717222”, ex:long “51.043611”. ex:51.076891, 3.717222 ex:lat “3.717222”, ex:long “51.043611”. RDF Mapping Language RML RML generating triples RML reusing mappings RML aligning & interlinking { ... "Performance" : { "Perf_ID": "567", "Venue": { "Name": "STAM", "Venue_ID": "78" }, "Location": { "long": "3.717222", "lat": "51.043611" } } , ... } ex:567 ex:venue ex:78 <#PerformancesMapping> rr:subjectMap [ rr:template “http://ex.com/{Perf_ID}”]; rr:predicateObjectMap [ rr:predicate ex:venue; rr:objectMap [ rr:parentTriplesMap <#VenueMapping> ] ]. <#VenueMapping> rml:logicalSource [ rml:source "http://ex.com/performances.json"; rml:referenceFormulation ql:JSONPath; rml:iterator "$.Performance.Venue.[*]" ]; rr:subjectMap [ rr:template "http://ex.com/{Venue_ID}"; rr:class ex:Venue ]. { ... "Performance" : { "Perf_ID": "567", "Venue": { "Name": "STAM", "Venue_ID": "78" }, ... } ... STAM ... ... ex:567 ex:venue ex:78. ex:398 ex:venue ex:78. <#EventsMapping> rr:subjectMap [ rr:template "http://ex.com/{@id}" ]; rr:predicateObjectMap [ rr:predicate ex:venue; rr:objectMap [ rr:parentTriplesMap <#VenueMapping>; rr:joinCondition [ rr:child "$.Performance.Venue.Name"; rr:parent "/Events/Exhibition/Venue" ] ] ] . ex:567 ex:venue ex:78. ex:398 ex:venue ex:STAM. ex:78 owl:sameAs ex:STAM RDF Mapping Language (RML) Processing Extraction Module Mapping Module RML Processor rml.io Anastasia Dimou @natadimou anastasia.dimou@ugent.be