Home > OS >  How to turn this list into a data frame? I would like to know how to do this in python?
How to turn this list into a data frame? I would like to know how to do this in python?

Time:06-30

I have a list in python which is showing like this just below. I would like to turn it into a data frame. I tried it: pd.DataFrame(myList), however the 'origins' column stores a list, however I would like to store the origin and quantityLeads keys in that same dataframe

myList = [
   {
      "id":3105052,
      "title":"Ebook Relat�rios Gerenciais",
      "offering":"Institucional",
      "created_date":"2022-06-28"
      "inserted_date":"2022-06-28",
      "channel":"Social",
      "start_date":"2022-06-28",
      "end_date":"2022-06-28",
      "origins":[
         {
            "origin":"LinkedIn",
            "quantityLeads":"1"
         },
         {
            "origin":"Facebook",
            "quantityLeads":"1"
         }
      ]
   },
   {
      "id":3105052,
      "title":"Ebook Relat�rios Gerenciais",
      "offering":"Institucional",
      "inserted_date":"2022-06-28",
      "created_date":"2022-06-28",
      "channel":"Direct",
      "start_date":"2022-06-28",
      "end_date":"2022-06-28",
      "origins":[
         {
            "origin":"Desconhecida",
            "quantityLeads":"2"
         }
      ]
   },
   {
      "id":2918513,
      "title":"Ebook Direct To Consumer",
      "offering":"Supply Chain",
      "created_date":"2022-06-28",
      "inserted_date":"2022-06-28",
      "channel":"Social",
      "start_date":"2022-06-28",
      "end_date":"2022-06-28",
      "origins":[
         {
            "origin":"LinkedIn",
            "quantityLeads":"1"
         }
      ]
   }
]

CodePudding user response:

If you have more than one element in "origins" you may first explode, create "origin", "quantityLeads" and then decide what to do with rest of the dataframe.

df = pd.DataFrame(myList)
df = df.explode('origins')
df[['origin', 'quantityLeads']] = pd.DataFrame(df['origins'].tolist())
df.drop('origins', axis=1, inplace=True)

print(df):

        id                        title       offering created_date  \
0  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28   
1  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28   
2  2918513     Ebook Direct To Consumer   Supply Chain   2022-06-28   

  inserted_date channel  start_date    end_date        origin quantityLeads  
0    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1  
1    2022-06-28  Direct  2022-06-28  2022-06-28  Desconhecida             2  
2    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1  

CodePudding user response:

In the pursuit of simplicity you could just flatten the dictionary structures with something like:

for row in myList:
   row["origin"] = row["origins"][0]["origin"]
   row["quantityLeads"] = row["origins"][0]["quantityLeads"]
   del row["origins"]

df = pd.DataFrame(myList)
print(df)

Output:

        id                        title       offering created_date inserted_date channel  start_date    end_date        origin quantityLeads
0  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1
1  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28    2022-06-28  Direct  2022-06-28  2022-06-28  Desconhecida             2
2  2918513     Ebook Direct To Consumer   Supply Chain    2022-06-28    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1

Just as a side note, for the myList sample above there is a missing comma after the first entry's created_date that's causing an error.

EDIT: If there are a variable number of items in the origins list, but each item has the same keys then we could iterate over those as well.

for row in myList:
   origins_list = row["origins"]
   counter = 0
   for item in origins_list:
      row["origin_"   str(counter)] = item["origin"]
      row["quantityLeads_"   str(counter)] = item["quantityLeads"]
      counter  = 1

   del row["origins"]

df = pd.DataFrame(myList)
print(df)
  • Related