I have the following dataframe:
list1 = [1, 6, 7, [46, 56, 49], 45, [15, 10, 12]]
list2 = [[49, 57, 45], 3, 7, 8, [16, 19, 12], 41]
data = {'A':list1,
'B': list2}
data = pd.DataFrame(data)
I can explode the dataframe using this piece of code:
data.explode('A').explode('B')
but when I run this one to do the same operation a value error is raised:
data.explode(['A', 'B'])
ValueError Traceback (most recent call last)
<ipython-input-97-efafc6c7cbfa> in <module>
5 'B': list2}
6 data = pd.DataFrame(data)
----> 7 data.explode(['A', 'B'])
~\AppData\Roaming\Python\Python38\site-packages\pandas\core\frame.py in explode(self, column, ignore_index)
9033 for c in columns[1:]:
9034 if not all(counts0 == self[c].apply(mylen)):
-> 9035 raise ValueError("columns must have matching element counts")
9036 result = DataFrame({c: df[c].explode() for c in columns})
9037 result = df.drop(columns, axis=1).join(result)
ValueError: columns must have matching element counts
Can anyone explain why?
CodePudding user response:
df.explode(["A", "B"])
and df.explode("A").explode("B")
do not do the same thing. It seems that you are aiming to get all the combinations where are the multi-column explode attempts to resolve a different scenario, one where you have paired lists in your columns. You can see the rationale in the original GitHub feature request. This seems to have been chosen to avoid duplicating values in one of the columns.
In the feature request there is a link to a GitHub gist/notebook that explores how explode could be implemented, but they seem to have not been able to explode with mis-matched list lengths in parallel.
CodePudding user response:
try this if it work in your case.
import numpy as np
data = pd.DataFrame({'A' : np.hstack(list1), 'B' : np.hstack(list2)})