I have a bit complicated looking json file that stores a dictionary.
Here is the dictionary if you want to code on your local machine.
{'https://www.linkedin.com/in/manashi-sherawat-mathur-phd-3a97b69':
{'showallexperiences':
[{'company_url': 'https://www.linkedin.com/company/1612/'},
{'company_url': 'https://www.linkedin.com/search/results/all/?keywords=Independent Pharma/Biotech Professional'], 'showalleducation':
[{'university_url': 'https://www.linkedin.com/company/3555/'}, {'university_url': None}]},
'https://www.linkedin.com/in/baneshwar-singh-6143082b/':
{'showallexperiences':
[{'company_url': 'https://www.linkedin.com/company/166810/'},
{'company_url': 'https://www.linkedin.com/company/166810/'}],
'showalleducation':
[{'university_url': 'https://www.linkedin.com/company/6737/'}, {'university_url': 'https://www.linkedin.com/company/5826549/'}]}}}
The main keys of the dictionary are different Linkedin URLs and their values are also dictionaries each storing company and university URLs belonging to that link.
I need to write the content of that dictionary to an excel file.
Particularly, each different Linkedin URL (the main key)'s info should be written on a separate row. I also hope to add some numbering for the columns of different company's and university's urls.
Can somebody please suggest any solution?
CodePudding user response:
Try:
import pandas as pd
data = {
"https://www.linkedin.com/in/manashi-sherawat-mathur-phd-3a97b69": {
"showallexperiences": [
{"company_url": "https://www.linkedin.com/company/1612/"},
{
"company_url": "https://www.linkedin.com/search/results/all/?keywords=Independent Pharma/Biotech Professional"
},
],
"showalleducation": [
{"university_url": "https://www.linkedin.com/company/3555/"},
{"university_url": None},
],
},
"https://www.linkedin.com/in/baneshwar-singh-6143082b/": {
"showallexperiences": [
{"company_url": "https://www.linkedin.com/company/166810/"},
{"company_url": "https://www.linkedin.com/company/166810/"},
],
"showalleducation": [
{"university_url": "https://www.linkedin.com/company/6737/"},
{"university_url": "https://www.linkedin.com/company/5826549/"},
],
},
}
all_data = []
for k, v in data.items():
company_urls = {
f"exp{i}_company_url": e["company_url"]
for i, e in enumerate(v["showallexperiences"], 1)
}
edu_urls = {
f"edu{i}_uni_url": e["university_url"]
for i, e in enumerate(v["showalleducation"], 1)
}
all_data.append({"linkedin_link": k, **company_urls, **edu_urls})
df = pd.DataFrame(all_data)
print(df.to_markdown(index=False))
df.to_csv("data.csv", index=False)
Prints:
linkedin_link | exp1_company_url | exp2_company_url | edu1_uni_url | edu2_uni_url |
---|---|---|---|---|
|