Home > database >  dataframe: parse a column containing list of dicts: Traceback ValueError: cannot reindex on an axis
dataframe: parse a column containing list of dicts: Traceback ValueError: cannot reindex on an axis

Time:12-15

I have one column(called 'data') in a dataframe which looks like this, each row has a list of dicts, starting with 2022-01-04, ended today, for example the 1st row is {'value': 18.76, 'date': '2022-01-04'}, {'value': 18.59, 'date': '2022-01-05'}, {'value': 18.99, 'date': '2022-01-06'}...

0     [{'value': 18.76, 'date': '2022-01-04'}, {'val...
1     [{'value': 38.58, 'date': '2022-01-04'}, {'val...
2     [{'value': 37.5, 'date': '2022-01-04'}, {'valu...
3     [{'value': 61.77, 'date': '2022-01-04'}, {'val...
4     [{'value': 110.54, 'date': '2022-01-04'}, {'va...
5     [{'value': 101.71, 'date': '2022-01-04'}, {'va...
6     [{'value': 86.45, 'date': '2022-01-04'}, {'val...
7     [{'value': 97.95, 'date': '2022-01-04'}, {'val...
8     [{'value': 38.39, 'date': '2022-01-04'}, {'val...
9     [{'value': 217.92, 'date': '2022-01-04'}, {'va...
10    [{'value': 86.94, 'date': '2022-01-04'}, {'val...
11    [{'value': 55.2, 'date': '2022-01-04'}, {'valu...
12    [{'value': 138.97, 'date': '2022-01-04'}, {'va...
13    [{'value': 4853125.0, 'date': '2022-01-04'}, {...
14    [{'value': 29.12, 'date': '2022-01-04'}, {'val...
15    [{'value': 90.77, 'date': '2022-01-04'}, {'val...
16    [{'value': 87.15, 'date': '2022-01-04'}, {'val...

I used a line of code which worked before

df[['date','value']] = df['data'].apply(lambda x: [[i['date'],i['value']] for i in x]).explode().apply(pd.Series, index=['date','value'])

but now this line fails and gives

ValueError: cannot reindex on an axis with duplicate labels

Is there an easy solution to solve this issue? as there are 300 dates hence 300 data points for each row, I am not sure which dates may contain duplicate data??

CodePudding user response:

You can try:

df = df.explode("Column1")
df = pd.concat([df, df.pop("Column1").apply(pd.Series)], axis=1)

print(df)

Prints:

         value        date
0        18.76  2022-01-04
0        18.59  2022-01-05
0        18.99  2022-01-06
1        38.58  2022-01-04
2        37.50  2022-01-04

...and so on.
  • Related