Home > OS >  Exploding nested lists using Pandas Series keeps failing
Exploding nested lists using Pandas Series keeps failing

Time:08-01

not used pandas explode before. I got the gist of the pd.explode but for value lists where selective cols have nested lists I heard that pd.Series.explode is useful. However, i keep getting : "KeyError: "None of ['city'] are in the columns". Yet 'city' is defined in the keys:

keys = ["city", "temp"]
values = [["chicago","london","berlin"], [[32,30,28],[39,40,25],[33,34,35]]]
df = pd.DataFrame({"keys":keys,"values":values})
df2 = df.set_index(['city']).apply(pd.Series.explode).reset_index()

desired output is:

city / temp
chicago / 32
chicago / 30
chicago / 28

etc.

I would appreciate an expert weighing in as to why this throws an error, and a fix, thank you.

CodePudding user response:

The problem comes from how you define df:

df = pd.DataFrame({"keys":keys,"values":values})

This actually gives you the following dataframe:

   keys                                      values
0  city                   [chicago, london, berlin]
1  temp  [[32, 30, 28], [39, 40, 25], [33, 34, 35]]

You probably meant:

df = pd.DataFrame(dict(zip(keys, values)))

Which gives you:

      city          temp
0  chicago  [32, 30, 28]
1   london  [39, 40, 25]
2   berlin  [33, 34, 35]

You can then use explode:

print(df.explode('temp'))

Output:

      city temp
0  chicago   32
0  chicago   30
0  chicago   28
1   london   39
1   london   40
1   london   25
2   berlin   33
2   berlin   34
2   berlin   35
  • Related