Home > database >  Explode raises values error ValueError: columns must have matching element counts
Explode raises values error ValueError: columns must have matching element counts

Time:01-09

I have the following dataframe:

list1 = [1, 6, 7, [46, 56, 49], 45, [15, 10, 12]]
list2 = [[49, 57, 45], 3, 7, 8, [16, 19, 12], 41]

data = {'A':list1,
        'B': list2}
data = pd.DataFrame(data)

I can explode the dataframe using this piece of code:

data.explode('A').explode('B')

but when I run this one to do the same operation a value error is raised:

data.explode(['A', 'B'])


ValueError                                Traceback (most recent call last)
<ipython-input-97-efafc6c7cbfa> in <module>
      5         'B': list2}
      6 data = pd.DataFrame(data)
----> 7 data.explode(['A', 'B'])

~\AppData\Roaming\Python\Python38\site-packages\pandas\core\frame.py in explode(self, column, ignore_index)
   9033             for c in columns[1:]:
   9034                 if not all(counts0 == self[c].apply(mylen)):
-> 9035                     raise ValueError("columns must have matching element counts")
   9036             result = DataFrame({c: df[c].explode() for c in columns})
   9037         result = df.drop(columns, axis=1).join(result)

ValueError: columns must have matching element counts

Can anyone explain why?

CodePudding user response:

df.explode(["A", "B"]) and df.explode("A").explode("B") do not do the same thing. It seems that you are aiming to get all the combinations where are the multi-column explode attempts to resolve a different scenario, one where you have paired lists in your columns. You can see the rationale in the original GitHub feature request. This seems to have been chosen to avoid duplicating values in one of the columns.

In the feature request there is a link to a GitHub gist/notebook that explores how explode could be implemented, but they seem to have not been able to explode with mis-matched list lengths in parallel.

CodePudding user response:

try this if it work in your case.

import numpy as np
data = pd.DataFrame({'A' : np.hstack(list1), 'B' : np.hstack(list2)})
  • Related