2019-01-29
What is a typical use case for a PoC?
. . .
nepotism /ˈnɛpətɪz(ə)m/
The practice among those with power or influence of favouring relatives or friends, especially by giving them jobs.
. . .
Mid 17th century: from French népotisme, from Italian nepotismo, from nipote ‘nephew’ (with reference to privileges bestowed on the ‘nephews’ of popes, who were in many cases their illegitimate sons).
Our dataset is a simplified version of the public IMDB dataset
https://toolbox.google.com/datasetsearch
OntoRefine text facets allow quick bulk-editing of values
United States
is normalised to USA
in 122 cells
Split columns according to a separator character
Edit the text in the cells
Remove whitespace so that the string can be used in a url/iri
Use a reconciliation service to match strings to real world objects.
Bulgaria
> https://www.wikidata.org/wiki/Q219
Moving from tabular data to linked data
Here is what our cleaned up table looks like…
… but here it is transformed into RDF.
Output from our ETL procedure
Does this model contain all the data we need?
Incorporating data from an additional data source.
Can we simplify things?
Single symmetric relation to use in a straightforward manner.
Can we simplify things further?
Three relations transformed into a single one.
But we are still working with two disconnected parts.
Now we have everything we need to ask our question.
What if we want to ask a more complex question?
At later stages we can rework the model which will then require corresponding changes to the procedure.
Visualize data in google charts in GDB
Highly configurable network visualisation using SPARQL
Download materials:
http://presentations.ontotext.com/etl-files.zip
Download and install GraphDB-free
Load post-ETL repository:
https://presentations.ontotext.com/movieDB_ETL.trig
Download SPARQL queries for next section:
@ Semantic PoC Training