I have a list of lists, a sample of which is pasted below. I would like to convert this to a pandas data frame but the list contains many duplicates. How would I remove duplicates from a list of lists like this and convert to a data frame with two columns: timestamp
and price
?
[[{'timestamp': 1648558320942, 'price': 47876.0},
{'timestamp': 1648558320942, 'price': 47876.0}],
[{'timestamp': 1648558321945, 'price': 47881.0},
{'timestamp': 1648558321945, 'price': 47881.0},
{'timestamp': 1648558321945, 'price': 47881.0}],
[{'timestamp': 1648558326768, 'price': 47876.0}]]
CodePudding user response:
You can flatten the list and drop duplicates from your dataframe.
# import toolboxes
import pandas as pd
from itertools import chain
# get data
data = [[{'timestamp': 1648558320942, 'price': 47876.0},
{'timestamp': 1648558320942, 'price': 47876.0}],
[{'timestamp': 1648558321945, 'price': 47881.0},
{'timestamp': 1648558321945, 'price': 47881.0},
{'timestamp': 1648558321945, 'price': 47881.0}],
[{'timestamp': 1648558326768, 'price': 47876.0}]]
# flatten, create df and drop duplicates
a = list(chain.from_iterable(data))
df = pd.DataFrame(a)
df = df.drop_duplicates()
Output:
print(df)
timestamp price
0 1648558320942 47876.0
2 1648558321945 47881.0
5 1648558326768 47876.0
CodePudding user response:
Just need to flatten out that list of lists:
import pandas as pd
data = [[{'timestamp': 1648558320942, 'price': 47876.0},
{'timestamp': 1648558320942, 'price': 47876.0}],
[{'timestamp': 1648558321945, 'price': 47881.0},
{'timestamp': 1648558321945, 'price': 47881.0},
{'timestamp': 1648558321945, 'price': 47881.0}],
[{'timestamp': 1648558326768, 'price': 47876.0}]]
newData = []
for each in data:
newData = each
# or list comprehension
# newData = [each for v in data for each in v]
df = pd.DataFrame(newData).drop_duplicates()
And as a one-liner:
df = pd.DataFrame([each for v in data for each in v]).drop_duplicates()
Output:
print(df)
timestamp price
0 1648558320942 47876.0
2 1648558321945 47881.0
5 1648558326768 47876.0
CodePudding user response:
Quick answer:
pd.DataFrame([item for sublist in my_list for item in sublist]).drop_duplicates()
Explanation:
- Flatten list of lists
- Create pandas DataFrame
- Remove duplicates
CodePudding user response:
import pandas as pd
list_of_dicts = [[{'timestamp': 1648558320942, 'price': 47876.0},
{'timestamp': 1648558320942, 'price': 47876.0}],
[{'timestamp': 1648558321945, 'price': 47881.0},
{'timestamp': 1648558321945, 'price': 47881.0},
{'timestamp': 1648558321945, 'price': 47881.0}],
[{'timestamp': 1648558326768, 'price': 47876.0}]]
df = pd.DataFrame([i[0] for i in list_of_dicts])
print(df)