I would like to do a Wikidata query of many values that are listed in a column of a CSV file on my computer. How can I load the values from the CSV file into the Wikidata query automatically without copying them in manually?
So far I have worked with the Wikidata query in Visual Studio Code.
This is the query I made for one person:
SELECT ?Author ?AuthorLabel ?VIAF ?birthLocation
WHERE {
VALUES ?VIAF {"2467372"}
?Author wdt:P214 ?VIAF ;
wdt:P19 ?birthLocation .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de". }
}
I want to automatically load many values into the curly brackets of the query above from the column of my CSV file.
CodePudding user response:
So, say you have a file my_file.csv
with the following content:
Q123
Q456
Q789
First of all, import a python library for reading files (like fileinput
).
Then declare the pattern that you want to use for your query, using %s
as placeholder for the identifiers.
Now, build a list of identifiers as follows:
identifiers = ['wd:' line.strip() for line in fileinput.input(files='my_file.csv')]
And finally join the list using a space character as separator and pass this string to your query pattern:
query = query_pattern % ' '.join(identifiers)
This is the final code:
import fileinput
filename = 'my_file.csv'
query_pattern = '''WHERE {
?Author wdt:P214 ?VIAF .
VALUES ?Author { %s } ;
wdt:P19 ?birthLocation .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de". }
}'''
identifiers = ['wd:' line.strip() for line in fileinput.input(files=filename)]
query = query_pattern % ' '.join(identifiers)
print(query)
Executing it, you'll get:
WHERE {
?Author wdt:P214 ?VIAF .
VALUES ?Author {wd:Q123 wd:Q456 wd:Q789} ;
wdt:P19 ?birthLocation .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de". }
}
CodePudding user response:
First, I feel compelled to point out that if you don't already know a programming language OpenRefine can do this for you in a few clicks.
Having said that, here's a basic Python program that accomplishes what you literally asked for - reading a set of VIAF ids and adding them to your query:
import csv
def expand_query(ids):
query = """
SELECT ?Author ?AuthorLabel ?VIAF ?birthLocation ?birthLocationLabel WHERE {
VALUES ?VIAF {
""" '"' '" "'.join(ids) '"' """
}
?Author wdt:P214 ?VIAF.
OPTIONAL { ?Author wdt:P19 ?birthLocation. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_Language],de,en". }
}
"""
return query
def main():
with open('../data/authors.csv', "rt") as csvfile:
csvreader = csv.DictReader(csvfile, dialect=csv.excel)
ids = [row["viaf"] for row in csvreader]
print(expand_query(ids))
if __name__ == "__main__":
main()
It expects a CSV file with a column called viaf
and will ignore all other columns. e.g.
name,viaf
Douglas Adams,113230702
William Shakespeare,96994048
Bertolt Brecht,2467372
I've tweaked the query slightly to:
- always output a row even if the birth location isn't available
- output the label for the birth location
- add English as an additional fallback language for labels
This makes the assumption that you've got a small enough set of identifiers to be able to use a single query, but you can extended it to:
- read identifiers in batchs of a convenient size
- use SPARQLwrapper to send the results to the Wikidata SPARQL endpoint and parse the results
- write the results to a different CSV file in chunks as they're received