I have a list in python which is showing like this just below. I would like to turn it into a data frame. I tried it: pd.DataFrame(myList), however the 'origins' column stores a list, however I would like to store the origin and quantityLeads keys in that same dataframe
myList = [
{
"id":3105052,
"title":"Ebook Relat�rios Gerenciais",
"offering":"Institucional",
"created_date":"2022-06-28"
"inserted_date":"2022-06-28",
"channel":"Social",
"start_date":"2022-06-28",
"end_date":"2022-06-28",
"origins":[
{
"origin":"LinkedIn",
"quantityLeads":"1"
},
{
"origin":"Facebook",
"quantityLeads":"1"
}
]
},
{
"id":3105052,
"title":"Ebook Relat�rios Gerenciais",
"offering":"Institucional",
"inserted_date":"2022-06-28",
"created_date":"2022-06-28",
"channel":"Direct",
"start_date":"2022-06-28",
"end_date":"2022-06-28",
"origins":[
{
"origin":"Desconhecida",
"quantityLeads":"2"
}
]
},
{
"id":2918513,
"title":"Ebook Direct To Consumer",
"offering":"Supply Chain",
"created_date":"2022-06-28",
"inserted_date":"2022-06-28",
"channel":"Social",
"start_date":"2022-06-28",
"end_date":"2022-06-28",
"origins":[
{
"origin":"LinkedIn",
"quantityLeads":"1"
}
]
}
]
CodePudding user response:
If you have more than one element in "origins" you may first explode, create "origin", "quantityLeads" and then decide what to do with rest of the dataframe.
df = pd.DataFrame(myList)
df = df.explode('origins')
df[['origin', 'quantityLeads']] = pd.DataFrame(df['origins'].tolist())
df.drop('origins', axis=1, inplace=True)
print(df):
id title offering created_date \
0 3105052 Ebook Relat�rios Gerenciais Institucional 2022-06-28
1 3105052 Ebook Relat�rios Gerenciais Institucional 2022-06-28
2 2918513 Ebook Direct To Consumer Supply Chain 2022-06-28
inserted_date channel start_date end_date origin quantityLeads
0 2022-06-28 Social 2022-06-28 2022-06-28 LinkedIn 1
1 2022-06-28 Direct 2022-06-28 2022-06-28 Desconhecida 2
2 2022-06-28 Social 2022-06-28 2022-06-28 LinkedIn 1
CodePudding user response:
In the pursuit of simplicity you could just flatten the dictionary structures with something like:
for row in myList:
row["origin"] = row["origins"][0]["origin"]
row["quantityLeads"] = row["origins"][0]["quantityLeads"]
del row["origins"]
df = pd.DataFrame(myList)
print(df)
Output:
id title offering created_date inserted_date channel start_date end_date origin quantityLeads
0 3105052 Ebook Relat�rios Gerenciais Institucional 2022-06-28 2022-06-28 Social 2022-06-28 2022-06-28 LinkedIn 1
1 3105052 Ebook Relat�rios Gerenciais Institucional 2022-06-28 2022-06-28 Direct 2022-06-28 2022-06-28 Desconhecida 2
2 2918513 Ebook Direct To Consumer Supply Chain 2022-06-28 2022-06-28 Social 2022-06-28 2022-06-28 LinkedIn 1
Just as a side note, for the myList sample above there is a missing comma after the first entry's created_date that's causing an error.
EDIT: If there are a variable number of items in the origins list, but each item has the same keys then we could iterate over those as well.
for row in myList:
origins_list = row["origins"]
counter = 0
for item in origins_list:
row["origin_" str(counter)] = item["origin"]
row["quantityLeads_" str(counter)] = item["quantityLeads"]
counter = 1
del row["origins"]
df = pd.DataFrame(myList)
print(df)