I have one column(called 'data') in a dataframe which looks like this, each row has a list of dicts, starting with 2022-01-04, ended today, for example the 1st row is {'value': 18.76, 'date': '2022-01-04'}, {'value': 18.59, 'date': '2022-01-05'}, {'value': 18.99, 'date': '2022-01-06'}...
0 [{'value': 18.76, 'date': '2022-01-04'}, {'val...
1 [{'value': 38.58, 'date': '2022-01-04'}, {'val...
2 [{'value': 37.5, 'date': '2022-01-04'}, {'valu...
3 [{'value': 61.77, 'date': '2022-01-04'}, {'val...
4 [{'value': 110.54, 'date': '2022-01-04'}, {'va...
5 [{'value': 101.71, 'date': '2022-01-04'}, {'va...
6 [{'value': 86.45, 'date': '2022-01-04'}, {'val...
7 [{'value': 97.95, 'date': '2022-01-04'}, {'val...
8 [{'value': 38.39, 'date': '2022-01-04'}, {'val...
9 [{'value': 217.92, 'date': '2022-01-04'}, {'va...
10 [{'value': 86.94, 'date': '2022-01-04'}, {'val...
11 [{'value': 55.2, 'date': '2022-01-04'}, {'valu...
12 [{'value': 138.97, 'date': '2022-01-04'}, {'va...
13 [{'value': 4853125.0, 'date': '2022-01-04'}, {...
14 [{'value': 29.12, 'date': '2022-01-04'}, {'val...
15 [{'value': 90.77, 'date': '2022-01-04'}, {'val...
16 [{'value': 87.15, 'date': '2022-01-04'}, {'val...
I used a line of code which worked before
df[['date','value']] = df['data'].apply(lambda x: [[i['date'],i['value']] for i in x]).explode().apply(pd.Series, index=['date','value'])
but now this line fails and gives
ValueError: cannot reindex on an axis with duplicate labels
Is there an easy solution to solve this issue? as there are 300 dates hence 300 data points for each row, I am not sure which dates may contain duplicate data??
CodePudding user response:
You can try:
df = df.explode("Column1")
df = pd.concat([df, df.pop("Column1").apply(pd.Series)], axis=1)
print(df)
Prints:
value date
0 18.76 2022-01-04
0 18.59 2022-01-05
0 18.99 2022-01-06
1 38.58 2022-01-04
2 37.50 2022-01-04
...and so on.