Home > Blockchain >  Wikidata query with values from CSV-File
Wikidata query with values from CSV-File

Time:01-24

I would like to do a Wikidata query of many values that are listed in a column of a CSV file on my computer. How can I load the values from the CSV file into the Wikidata query automatically without copying them in manually?

So far I have worked with the Wikidata query in Visual Studio Code.

This is the query I made for one person:

SELECT ?Author ?AuthorLabel ?VIAF ?birthLocation
   WHERE {
   VALUES ?VIAF {"2467372"}
   ?Author wdt:P214 ?VIAF ;
     wdt:P19 ?birthLocation .
           
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de". }              
}

I want to automatically load many values into the curly brackets of the query above from the column of my CSV file.

CodePudding user response:

So, say you have a file my_file.csv with the following content:

Q123
Q456
Q789

First of all, import a python library for reading files (like fileinput).

Then declare the pattern that you want to use for your query, using %s as placeholder for the identifiers.

Now, build a list of identifiers as follows:

identifiers = ['wd:' line.strip() for line in fileinput.input(files='my_file.csv')]

And finally join the list using a space character as separator and pass this string to your query pattern:

query = query_pattern % ' '.join(identifiers)

This is the final code:

import fileinput

filename = 'my_file.csv'
query_pattern = '''WHERE {
   ?Author wdt:P214 ?VIAF .
   VALUES ?Author { %s } ;
       wdt:P19 ?birthLocation .
           
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de". }              
}'''

identifiers = ['wd:' line.strip() for line in fileinput.input(files=filename)]
query = query_pattern % ' '.join(identifiers)
print(query)

Executing it, you'll get:

WHERE {
   ?Author wdt:P214 ?VIAF .
   VALUES ?Author {wd:Q123 wd:Q456 wd:Q789} ;
           wdt:P19 ?birthLocation .

   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de". }
}

CodePudding user response:

First, I feel compelled to point out that if you don't already know a programming language OpenRefine can do this for you in a few clicks.

Having said that, here's a basic Python program that accomplishes what you literally asked for - reading a set of VIAF ids and adding them to your query:

import csv


def expand_query(ids):
    query = """
    SELECT ?Author ?AuthorLabel ?VIAF ?birthLocation ?birthLocationLabel WHERE {
        VALUES ?VIAF {
        """   '"'   '" "'.join(ids)   '"' """
        }
        ?Author wdt:P214 ?VIAF. 
        OPTIONAL { ?Author wdt:P19 ?birthLocation. }
        SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de,en". }
    }
    """
    return query


def main():
    with open('../data/authors.csv', "rt") as csvfile:
        csvreader = csv.DictReader(csvfile, dialect=csv.excel)
        ids = [row["viaf"] for row in csvreader]
        print(expand_query(ids))


if __name__ == "__main__":
    main()

It expects a CSV file with a column called viaf and will ignore all other columns. e.g.

name,viaf
Douglas Adams,113230702
William Shakespeare,96994048
Bertolt Brecht,2467372

I've tweaked the query slightly to:

  • always output a row even if the birth location isn't available
  • output the label for the birth location
  • add English as an additional fallback language for labels

This makes the assumption that you've got a small enough set of identifiers to be able to use a single query, but you can extended it to:

  • read identifiers in batchs of a convenient size
  • use SPARQLwrapper to send the results to the Wikidata SPARQL endpoint and parse the results
  • write the results to a different CSV file in chunks as they're received
  • Related