The data I'm working with is the entire list of yugioh cards, found at this endpoint: https://db.ygoprodeck.com/api/v7/cardinfo.php
There's a top-level data
node that I can ignore, but each sub-node immediately after that (0
, 1
, 2
,...) is a unique card in the cardset. I want to find out how many unique keys there are in this entire dataset. The tricky part is that each card could have different sub-nodes than another card. Eventually I want to put this all into SQL tables, but for now I need to know all of the keys. Some examples of the keys are id
, name
, archetype
, atk
, def
, and card_sets
. How do I extract a unique list of all keys? I'm looking for the easiest way to get this list. I have experience in Python and T-SQL, but any other language is fine since my goal is to just look at the list.
CodePudding user response:
I used generators to solve this problem
If the data is dict, it's keys will be yield
If the data is list or tuple, it's elements continue to be parsed
String are also iterable and need to be excluded
import json
import requests
from collections import Iterable
def get_key(data):
if isinstance(data, dict):
for k, v in data.items():
yield k
yield from get_key(v)
elif isinstance(data, Iterable) and not isinstance(data, str):
for i in data:
yield from get_key(i)
def main():
url = "https://db.ygoprodeck.com/api/v7/cardinfo.php"
res = requests.get(url)
data = json.loads(res.text)["data"]
result = set(get_key(data))
print(result)
if __name__ == '__main__':
main()
And the output
{'set_rarity', 'def', 'image_url_small', 'ban_tcg', 'set_rarity_code', 'linkmarkers', 'type', 'coolstuffinc_price', 'cardmarket_price', 'id', 'image_url', 'level', 'name', 'set_code', 'banlist_info', 'desc', 'set_name', 'card_prices', 'attribute', 'linkval', 'ebay_price', 'tcgplayer_price', 'card_sets', 'atk', 'set_price', 'ban_ocg', 'ban_goat', 'archetype', 'amazon_price', 'scale', 'race', 'card_images'}