Home > Software design >  Pandas how can I split rows with multiple entries into separate rows?
Pandas how can I split rows with multiple entries into separate rows?

Time:12-10

I have the following DataFrame:

df
Out[69]:
    A                          B                  C
0   Banana | Apple | Banana    No | Yes | Yes     Yes | No | Yes
1   Banana                     No                 Yes
2   Banana                     No                 No
3   Apple | Banana             Yes | Yes          No | No
4   Apple                      Yes                Yes

The top row (index 0) and fourth row (index 3) represent three and two entries, respectively. I would like to write code that could split each row with multiple entries (delimited by |) into separate rows, so I have one row for each entry.

In this example, I would like to produce the following DataFrame:

    A       B    C
0   Banana  No   Yes
1   Apple   Yes  No
2   Banana  Yes  Yes
3   Banana  No   Yes
4   Banana  No   No
5   Apple   Yes  No
6   Banana  Yes  No
7   Apple   Yes  Yes

or:

    A       B    C
0   Banana  No   Yes
0   Apple   Yes  No
0   Banana  Yes  Yes
1   Banana  No   Yes
2   Banana  No   No
3   Apple   Yes  No
3   Banana  Yes  No
4   Apple   Yes  Yes

I looked into the explode function but it looks like it expects the rows to be in list format.

Here is what happens when I run print(df.head().to_dict()):

{'A': {0: 'Banana | Apple | Banana', 1: 'Banana', 2: 'Banana', 3: 'Apple | Banana', 4: 'Apple'}, 'B': {0: 'No | Yes | Yes', 1: 'No', 2: 'No', 3: 'Yes | Yes', 4: 'Yes'}, 'C': {0: 'Yes | No | Yes', 1: 'Yes', 2: 'No', 3: 'No | No', 4: 'Yes'}}

CodePudding user response:

You can still use explode, just split the strings by |:

df = df.apply(lambda col: col.str.split(r'\s*\|\s*').explode())

Output:

>>> df
        A    B    C
0  Banana   No  Yes
0   Apple  Yes   No
0  Banana  Yes  Yes
1  Banana   No  Yes
2  Banana   No   No
3   Apple  Yes   No
3  Banana  Yes   No
4   Apple  Yes  Yes
  • Related