I have pandas data frame like this
Name Col1
Foo [A, C, B, D, E]
Bar [G, M, O]
Baz [E, B]
I want to change it to:
Name New_Col1 New_Col2
Foo A C
Foo C B
Foo B D
Foo D E
Bar G M
Bar M O
Baz E B
How can I do that?
CodePudding user response:
Do two explode
operations and then concat
the results:
s = df.set_index('Name').Col1.str
pd.concat([s[:-1].explode(),
s[1:].explode()],
axis=1)\
.reset_index()
Name Col1 Col1
0 Foo A C
1 Foo C B
2 Foo B D
3 Foo D E
4 Bar G M
5 Bar M O
6 Baz E B
CodePudding user response:
Using explode
and GroupBy.shift
:
s = df['Col1'].explode()
df.drop(columns='Col1').join(
pd.concat([s, s.groupby(level=0).shift(-1)], axis=1)
.set_axis(['New_Col1', 'New_Col2'], axis=1)
.dropna()
)
output:
Name New_Col1 New_Col2
0 Foo A C
0 Foo C B
0 Foo B D
0 Foo D E
1 Bar G M
1 Bar M O
2 Baz E B
With the approach, you can easily generalize to more combinations, example with 3:
s = df['Col1'].explode()
n = 3
df.drop(columns='Col1').join(
pd.concat([s.groupby(level=0).shift(-x) for x in range(n)], axis=1)
.set_axis([f'New_Col{i 1}' for i in range(n)], axis=1)
.dropna()
)
output:
Name New_Col1 New_Col2 New_Col3
0 Foo A C B
0 Foo C B D
0 Foo B D E
1 Bar G M O
2 Baz NaN NaN NaN