I have a sample JSON file like this
data = {
"type": "video",
"videoID": "vid001",
"links": [
{"type": "video", "videoID": "vid002", "links": []},
{"type": "video",
"videoID": "vid003",
"links": [
{"type": "video", "videoID": "vid004"},
{"type": "video", "videoID": "vid005"},
]
},
{"type": "video", "videoID": "vid006"},
{"type": "video",
"videoID": "vid007",
"links": [
{"type": "video", "videoID": "vid008", "links": [
{"type": "video",
"videoID": "vid009",
"links": [{"type": "video", "videoID": "vid010"}]
}
]}
]},
]
}
i need to take only specific key and values from the json file and convert it into a CSV file
Code: Ref:Extracting Specific Keys/Values From A Messed-Up JSON File (Python)
def extract(data, keys):
out = []
queue = [data]
while len(queue) > 0:
current = queue.pop(0)
if type(current) == dict:
for key in keys:
if key in current:
out.append({key: current[key]})
for val in current.values():
if type(val) in [list, dict]:
queue.append(val)
elif type(current) == list:
queue.extend(current)
return out
x = extract(data, ["videoID","type"])
print(pd.DataFrame.from_dict(x))
When I'm passing 2 values through the extract(), getting NaN in between result
videoID type
0 vid001 NaN
1 NaN video
2 vid002 NaN
3 NaN video
4 vid003 NaN
5 NaN video
6 vid006 NaN
7 NaN video
8 vid007 NaN
9 NaN video
10 vid004 NaN
11 NaN video
12 vid005 NaN
13 NaN video
14 vid008 NaN
15 NaN video
16 vid009 NaN
17 NaN video
18 vid010 NaN
19 NaN video
I need to get an output like the below
videoID type
0 vid001 video
1 vid002 video
2 vid003 video
3 vid004 video
etc...
and convert it into a csv file, can somebody help me to solve this issue
CodePudding user response:
Try recursion:
def get_type_videoid(o):
if isinstance(o, dict):
if "type" in o and "videoID" in o:
yield o["videoID"], o["type"]
for v in o.values():
yield from get_type_videoid(v)
elif isinstance(o, list):
for v in o:
yield from get_type_videoid(v)
df = pd.DataFrame(get_type_videoid(data), columns=["videoID", "type"])
print(df)
Prints:
videoID type
0 vid001 video
1 vid002 video
2 vid003 video
3 vid004 video
4 vid005 video
5 vid006 video
6 vid007 video
7 vid008 video
8 vid009 video
9 vid010 video
CodePudding user response:
Your method seems ok to me. You are just making a mistake in the for key in keys
loop: what you are currently doing is you are creating a single dict ({key: current[key]}
) for every element. So at the end you will have a out
list that is a list of dicts that are not related, in which each videoID
is in a dict and type
is in a different dict.
Like this:
[{'videoID': 'vid001'}, {'type': 'video'}, {'videoID': 'vid002'}, {'type': 'video'}, {'videoID': 'vid003'}, {'type': 'video'}, {'videoID': 'vid006'}, {'type': 'video'}, {'videoID': 'vid007'}, {'type': 'video'}, {'videoID': 'vid004'}, {'type': 'video'}, {'videoID': 'vid005'}, {'type': 'video'}, {'videoID': 'vid008'}, {'type': 'video'}, {'videoID': 'vid009'}, {'type': 'video'}, {'videoID': 'vid010'}, {'type': 'video'}]
What you would want instead is:
[{'videoID': 'vid001', 'type': 'video'}, {'videoID': 'vid002', 'type': 'video'}, {'videoID': 'vid003', 'type': 'video'}, {'videoID': 'vid006', 'type': 'video'}, {'videoID': 'vid007', 'type': 'video'}, {'videoID': 'vid004', 'type': 'video'}, {'videoID': 'vid005', 'type': 'video'}, {'videoID': 'vid008', 'type': 'video'}, {'videoID': 'vid009', 'type': 'video'}, {'videoID': 'vid010', 'type': 'video'}]
where each videoID is related to its type.
To do so you just have to create a dict when looping on keys, add each element to the dict and then append the dict to out
list at the end of the loop on keys.
What I would do is this:
data_couple = {}
for key in keys:
if key in current:
data_couple[key] = current[key]
# out.append({key: current[key]})
out.append(data_couple)``
so the whole extract
function would become:
def extract(data, keys):
out = []
queue = [data]
while len(queue) > 0:
current = queue.pop(0)
if type(current) == dict:
data_couple = {}
for key in keys:
if key in current:
data_couple[key] = current[key]
out.append(data_couple)
for val in current.values():
if type(val) in [list, dict]:
queue.append(val)
elif type(current) == list:
queue.extend(current)
return out
Finally, to write a dict in a csv file I would simply use csv DictWriter:
import csv
def writeToCsv(dictionary, col_name):
with open("file.csv", "w") as f:
wr = csv.DictWriter(f, fieldnames=col_name)
wr.writeheader()
for elem in dictionary:
wr.writerow(elem)
writeToCsv(your_dictionary, ["videoID", "type"])
This would create a CSV with videoID
and type
columns
Alternatively, as suggested by Ivan Calderon answer here you could also use pandas to_csv
method in one line:
you should change your dictionary before:
x = extract(data, ["videoID","type"])
d = {k: [v] for k, v in x.items()}
pd.DataFrame.from_dict(data=d, orient='columns').to_csv('dict_file.csv')
but I am not sure about this...