I have the following DataFrame:
df
Out[69]:
A B C
0 Banana | Apple | Banana No | Yes | Yes Yes | No | Yes
1 Banana No Yes
2 Banana No No
3 Apple | Banana Yes | Yes No | No
4 Apple Yes Yes
The top row (index 0) and fourth row (index 3) represent three and two entries, respectively. I would like to write code that could split each row with multiple entries (delimited by |
) into separate rows, so I have one row for each entry.
In this example, I would like to produce the following DataFrame:
A B C
0 Banana No Yes
1 Apple Yes No
2 Banana Yes Yes
3 Banana No Yes
4 Banana No No
5 Apple Yes No
6 Banana Yes No
7 Apple Yes Yes
or:
A B C
0 Banana No Yes
0 Apple Yes No
0 Banana Yes Yes
1 Banana No Yes
2 Banana No No
3 Apple Yes No
3 Banana Yes No
4 Apple Yes Yes
I looked into the explode
function but it looks like it expects the rows to be in list format.
Here is what happens when I run print(df.head().to_dict())
:
{'A': {0: 'Banana | Apple | Banana', 1: 'Banana', 2: 'Banana', 3: 'Apple | Banana', 4: 'Apple'}, 'B': {0: 'No | Yes | Yes', 1: 'No', 2: 'No', 3: 'Yes | Yes', 4: 'Yes'}, 'C': {0: 'Yes | No | Yes', 1: 'Yes', 2: 'No', 3: 'No | No', 4: 'Yes'}}
CodePudding user response:
You can still use explode
, just split the strings by |
:
df = df.apply(lambda col: col.str.split(r'\s*\|\s*').explode())
Output:
>>> df
A B C
0 Banana No Yes
0 Apple Yes No
0 Banana Yes Yes
1 Banana No Yes
2 Banana No No
3 Apple Yes No
3 Banana Yes No
4 Apple Yes Yes