Home > database >  How to split a row into two rows in python based on delimiter in Python
How to split a row into two rows in python based on delimiter in Python

Time:07-06

I have the following input file in csv

A,B,C,D
1,2,|3|4|5|6|7|8,9
11,12,|13|14|15|16|17|18,19

How do I split column C right in the middle into two new rows with additional column E where the first half of the split get "0" in Column E and the second half get "1" in Column E?

A,B,C,D,E
1,2,|3|4|5,9,0
1,2,|6|7|8,9,1
11,12,|13|14|15,19,0
11,12,|16|17|18,19,1

Thank you

CodePudding user response:

If I understand you correctly, you can use str.split on column 'C', then .explode() the column and join it again:

df["C"] = df["C"].apply(
    lambda x: [
        (vals := x.strip(" |").split("|"))[: len(vals) // 2],
        vals[len(vals) // 2 :],
    ]
)
df["E"] = df["C"].apply(lambda x: range(len(x)))
df = df.explode(["C", "E"])
df["C"] = "|"   df["C"].apply("|".join)

print(df.to_csv(index=False))

Prints:

A,B,C,D,E
1,2,|3|4|5,9,0
1,2,|6|7|8,9,1
11,12,|13|14|15,19,0
11,12,|16|17|18,19,1

CodePudding user response:

Using a regex and str.findall to break the string, then explode and Groupby.cumcount:

(df.assign(C=df['C'].str.findall('(?:\|[^|]*){3}'))
   .explode('C')
   .assign(E=lambda d: d.groupby(level=0).cumcount())
   #.to_csv('out.csv', index=False)
 )

Output (before CSV export):

    A   B          C   D  E
0   1   2     |3|4|5   9  0
0   1   2     |6|7|8   9  1
1  11  12  |13|14|15  19  0
1  11  12  |16|17|18  19  1

Output CSV:

A,B,C,D,E
1,2,|3|4|5,9,0
1,2,|6|7|8,9,1
11,12,|13|14|15,19,0
11,12,|16|17|18,19,1
  • Related