How to turn this list into a data frame? I would like to know how to do this in python?-CodePudding

I have a list in python which is showing like this just below. I would like to turn it into a data frame. I tried it: pd.DataFrame(myList), however the 'origins' column stores a list, however I would like to store the origin and quantityLeads keys in that same dataframe

myList = [
   {
      "id":3105052,
      "title":"Ebook Relat�rios Gerenciais",
      "offering":"Institucional",
      "created_date":"2022-06-28"
      "inserted_date":"2022-06-28",
      "channel":"Social",
      "start_date":"2022-06-28",
      "end_date":"2022-06-28",
      "origins":[
         {
            "origin":"LinkedIn",
            "quantityLeads":"1"
         },
         {
            "origin":"Facebook",
            "quantityLeads":"1"
         }
      ]
   },
   {
      "id":3105052,
      "title":"Ebook Relat�rios Gerenciais",
      "offering":"Institucional",
      "inserted_date":"2022-06-28",
      "created_date":"2022-06-28",
      "channel":"Direct",
      "start_date":"2022-06-28",
      "end_date":"2022-06-28",
      "origins":[
         {
            "origin":"Desconhecida",
            "quantityLeads":"2"
         }
      ]
   },
   {
      "id":2918513,
      "title":"Ebook Direct To Consumer",
      "offering":"Supply Chain",
      "created_date":"2022-06-28",
      "inserted_date":"2022-06-28",
      "channel":"Social",
      "start_date":"2022-06-28",
      "end_date":"2022-06-28",
      "origins":[
         {
            "origin":"LinkedIn",
            "quantityLeads":"1"
         }
      ]
   }
]

CodePudding user response：

If you have more than one element in "origins" you may first explode, create "origin", "quantityLeads" and then decide what to do with rest of the dataframe.

df = pd.DataFrame(myList)
df = df.explode('origins')
df[['origin', 'quantityLeads']] = pd.DataFrame(df['origins'].tolist())
df.drop('origins', axis=1, inplace=True)

print(df):

        id                        title       offering created_date  \
0  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28   
1  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28   
2  2918513     Ebook Direct To Consumer   Supply Chain   2022-06-28   

  inserted_date channel  start_date    end_date        origin quantityLeads  
0    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1  
1    2022-06-28  Direct  2022-06-28  2022-06-28  Desconhecida             2  
2    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1

CodePudding user response：

In the pursuit of simplicity you could just flatten the dictionary structures with something like:

for row in myList:
   row["origin"] = row["origins"][0]["origin"]
   row["quantityLeads"] = row["origins"][0]["quantityLeads"]
   del row["origins"]

df = pd.DataFrame(myList)
print(df)

Output:

        id                        title       offering created_date inserted_date channel  start_date    end_date        origin quantityLeads
0  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1
1  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28    2022-06-28  Direct  2022-06-28  2022-06-28  Desconhecida             2
2  2918513     Ebook Direct To Consumer   Supply Chain    2022-06-28    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1

Just as a side note, for the myList sample above there is a missing comma after the first entry's created_date that's causing an error.

EDIT: If there are a variable number of items in the origins list, but each item has the same keys then we could iterate over those as well.

for row in myList:
   origins_list = row["origins"]
   counter = 0
   for item in origins_list:
      row["origin_"   str(counter)] = item["origin"]
      row["quantityLeads_"   str(counter)] = item["quantityLeads"]
      counter  = 1

   del row["origins"]

df = pd.DataFrame(myList)
print(df)