I have the following input file in csv
A,B,C,D
1,2,|3|4|5|6|7|8,9
11,12,|13|14|15|16|17|18,19
How do I split column C right in the middle into two new rows with additional column E where the first half of the split get "0" in Column E and the second half get "1" in Column E?
A,B,C,D,E
1,2,|3|4|5,9,0
1,2,|6|7|8,9,1
11,12,|13|14|15,19,0
11,12,|16|17|18,19,1
Thank you
CodePudding user response:
If I understand you correctly, you can use str.split
on column 'C', then .explode()
the column and join it again:
df["C"] = df["C"].apply(
lambda x: [
(vals := x.strip(" |").split("|"))[: len(vals) // 2],
vals[len(vals) // 2 :],
]
)
df["E"] = df["C"].apply(lambda x: range(len(x)))
df = df.explode(["C", "E"])
df["C"] = "|" df["C"].apply("|".join)
print(df.to_csv(index=False))
Prints:
A,B,C,D,E
1,2,|3|4|5,9,0
1,2,|6|7|8,9,1
11,12,|13|14|15,19,0
11,12,|16|17|18,19,1
CodePudding user response:
Using a regex and str.findall
to break the string, then explode
and Groupby.cumcount
:
(df.assign(C=df['C'].str.findall('(?:\|[^|]*){3}'))
.explode('C')
.assign(E=lambda d: d.groupby(level=0).cumcount())
#.to_csv('out.csv', index=False)
)
Output (before CSV export):
A B C D E
0 1 2 |3|4|5 9 0
0 1 2 |6|7|8 9 1
1 11 12 |13|14|15 19 0
1 11 12 |16|17|18 19 1
Output CSV:
A,B,C,D,E
1,2,|3|4|5,9,0
1,2,|6|7|8,9,1
11,12,|13|14|15,19,0
11,12,|16|17|18,19,1