I am currently working on a project to make a database of professors. I was able to loop through each link in a for loop and get such data:
{"total": 1, "offset": 0, "data": [{"authorId": "118985833", "name": "Krystle K. Madrid"}]}
{"total": 1, "offset": 0, "data": [{"authorId": "107707217", "name": "S. N. Kirnon"}]}
{"total": 5, "offset": 0, "data": [{"authorId": "121754802", "name": "Jason L. Jarvis"}, {"authorId": "143879405", "name": "J. Jarvis"}, {"authorId": "145088127", "name": "J. Jarvis"}, {"authorId": "31259897", "name": "J. W. Jarvis"}, {"authorId": "38535862", "name": "J. D. Jarvis"}]}
I am trying to find a way to make a excel file with columns AuthorID and Name. Please let me know if anyone can help me how I can do this using Python.
CodePudding user response:
For example, you could do something like the following:
dics = [
{"total": 1, "offset": 0, "data": [{"authorId": "118985833", "name": "Krystle K. Madrid"}]},
{"total": 1, "offset": 0, "data": [{"authorId": "107707217", "name": "S. N. Kirnon"}]},
{"total": 5, "offset": 0, "data": [{"authorId": "121754802", "name": "Jason L. Jarvis"}, {"authorId": "143879405", "name": "J. Jarvis"}, {"authorId": "145088127", "name": "J. Jarvis"}, {"authorId": "31259897", "name": "J. W. Jarvis"}, {"authorId": "38535862", "name": "J. D. Jarvis"}]}
]
df = pd.DataFrame(columns = ['AuthorID','Name'])
for d1 in dics:
for d2 in d1["data"]:
df = pd.concat([df,
pd.DataFrame({"AuthorID":[d2["authorId"]],
"Name":[d2["name"]]})],
ignore_index=True)
The result:
AuthorID Name
0 118985833 Krystle K. Madrid
1 107707217 S. N. Kirnon
2 121754802 Jason L. Jarvis
3 143879405 J. Jarvis
4 145088127 J. Jarvis
5 31259897 J. W. Jarvis
6 38535862 J. D. Jarvis
A similar loop with the same result:
for d1 in dics:
d_new = {"AuthorID":[d2["authorId"] for d2 in d1["data"]],
"Name":[d2["name"] for d2 in d1["data"]]}
df = pd.concat([df,pd.DataFrame(d_new)],ignore_index=True)
CodePudding user response:
With Pandas:
import pandas as pd
df = pd.DataFrame(
[
data
for link in links
for data in link["data"]
]
)
df.rename(columns={"authorId": "AuthorId", "name": "Name"}).to_csv("data.csv")