Home > Software engineering >  convert list of lists to pandas data frame
convert list of lists to pandas data frame

Time:03-30

I have a list of lists, a sample of which is pasted below. I would like to convert this to a pandas data frame but the list contains many duplicates. How would I remove duplicates from a list of lists like this and convert to a data frame with two columns: timestamp and price?

[[{'timestamp': 1648558320942, 'price': 47876.0},
  {'timestamp': 1648558320942, 'price': 47876.0}],
 [{'timestamp': 1648558321945, 'price': 47881.0},
  {'timestamp': 1648558321945, 'price': 47881.0},
  {'timestamp': 1648558321945, 'price': 47881.0}],
 [{'timestamp': 1648558326768, 'price': 47876.0}]]

CodePudding user response:

You can flatten the list and drop duplicates from your dataframe.

# import toolboxes
import pandas as pd
from itertools import chain

# get data
data = [[{'timestamp': 1648558320942, 'price': 47876.0},
      {'timestamp': 1648558320942, 'price': 47876.0}],
     [{'timestamp': 1648558321945, 'price': 47881.0},
      {'timestamp': 1648558321945, 'price': 47881.0},
      {'timestamp': 1648558321945, 'price': 47881.0}],
     [{'timestamp': 1648558326768, 'price': 47876.0}]]

# flatten, create df and drop duplicates
a = list(chain.from_iterable(data))
df = pd.DataFrame(a)
df = df.drop_duplicates()

Output:

print(df)
       timestamp    price
0  1648558320942  47876.0
2  1648558321945  47881.0
5  1648558326768  47876.0

CodePudding user response:

Just need to flatten out that list of lists:

import pandas as pd

data = [[{'timestamp': 1648558320942, 'price': 47876.0},
  {'timestamp': 1648558320942, 'price': 47876.0}],
 [{'timestamp': 1648558321945, 'price': 47881.0},
  {'timestamp': 1648558321945, 'price': 47881.0},
  {'timestamp': 1648558321945, 'price': 47881.0}],
 [{'timestamp': 1648558326768, 'price': 47876.0}]]

newData = []
for each in data:
    newData  = each

# or list comprehension
# newData = [each for v in data for each in v]


df = pd.DataFrame(newData).drop_duplicates()

And as a one-liner:

df = pd.DataFrame([each for v in data for each in v]).drop_duplicates()

Output:

print(df)
       timestamp    price
0  1648558320942  47876.0
2  1648558321945  47881.0
5  1648558326768  47876.0

CodePudding user response:

Quick answer:

pd.DataFrame([item for sublist in my_list for item in sublist]).drop_duplicates()

Explanation:

  1. Flatten list of lists
  2. Create pandas DataFrame
  3. Remove duplicates

CodePudding user response:

import pandas as pd
list_of_dicts = [[{'timestamp': 1648558320942, 'price': 47876.0},
  {'timestamp': 1648558320942, 'price': 47876.0}],
 [{'timestamp': 1648558321945, 'price': 47881.0},
  {'timestamp': 1648558321945, 'price': 47881.0},
  {'timestamp': 1648558321945, 'price': 47881.0}],
 [{'timestamp': 1648558326768, 'price': 47876.0}]]
df = pd.DataFrame([i[0] for i in list_of_dicts])
print(df)
  • Related