Wikidata query with values from CSV-File-CodePudding

I would like to do a Wikidata query of many values that are listed in a column of a CSV file on my computer. How can I load the values from the CSV file into the Wikidata query automatically without copying them in manually?

So far I have worked with the Wikidata query in Visual Studio Code.

This is the query I made for one person:

SELECT ?Author ?AuthorLabel ?VIAF ?birthLocation
   WHERE {
   VALUES ?VIAF {"2467372"}
   ?Author wdt:P214 ?VIAF ;
     wdt:P19 ?birthLocation .
           
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de". }              
}

I want to automatically load many values into the curly brackets of the query above from the column of my CSV file.

CodePudding user response：

So, say you have a file my_file.csv with the following content:

Q123
Q456
Q789

First of all, import a python library for reading files (like fileinput).

Then declare the pattern that you want to use for your query, using %s as placeholder for the identifiers.

Now, build a list of identifiers as follows:

identifiers = ['wd:' line.strip() for line in fileinput.input(files='my_file.csv')]

And finally join the list using a space character as separator and pass this string to your query pattern:

query = query_pattern % ' '.join(identifiers)

This is the final code:

import fileinput

filename = 'my_file.csv'
query_pattern = '''WHERE {
   ?Author wdt:P214 ?VIAF .
   VALUES ?Author { %s } ;
       wdt:P19 ?birthLocation .
           
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de". }              
}'''

identifiers = ['wd:' line.strip() for line in fileinput.input(files=filename)]
query = query_pattern % ' '.join(identifiers)
print(query)

Executing it, you'll get:

WHERE {
   ?Author wdt:P214 ?VIAF .
   VALUES ?Author {wd:Q123 wd:Q456 wd:Q789} ;
           wdt:P19 ?birthLocation .

   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de". }
}

CodePudding user response：

First, I feel compelled to point out that if you don't already know a programming language OpenRefine can do this for you in a few clicks.

Having said that, here's a basic Python program that accomplishes what you literally asked for - reading a set of VIAF ids and adding them to your query:

import csv


def expand_query(ids):
    query = """
    SELECT ?Author ?AuthorLabel ?VIAF ?birthLocation ?birthLocationLabel WHERE {
        VALUES ?VIAF {
        """   '"'   '" "'.join(ids)   '"' """
        }
        ?Author wdt:P214 ?VIAF. 
        OPTIONAL { ?Author wdt:P19 ?birthLocation. }
        SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de,en". }
    }
    """
    return query


def main():
    with open('../data/authors.csv', "rt") as csvfile:
        csvreader = csv.DictReader(csvfile, dialect=csv.excel)
        ids = [row["viaf"] for row in csvreader]
        print(expand_query(ids))


if __name__ == "__main__":
    main()

It expects a CSV file with a column called viaf and will ignore all other columns. e.g.

name,viaf
Douglas Adams,113230702
William Shakespeare,96994048
Bertolt Brecht,2467372

I've tweaked the query slightly to:

always output a row even if the birth location isn't available
output the label for the birth location
add English as an additional fallback language for labels

This makes the assumption that you've got a small enough set of identifiers to be able to use a single query, but you can extended it to:

read identifiers in batchs of a convenient size
use SPARQLwrapper to send the results to the Wikidata SPARQL endpoint and parse the results
write the results to a different CSV file in chunks as they're received