I am looking to split the following column to two columns in my pandas dataframe by splitting on the last 0 in every row.
000012345
000012345
000012345
What I would like it to look like
0000 12345
0000 12345
I've been looking into str.split from can't seem to figure how to approach this as there is no usual delimiter, and I can not figure out how to make it split on the 4th 0.
I have had success with a similar issue previously with the following command, but can not seem to figure it out, as im not looking to split labels, but values in the rows.
df.labels.str.split(':',1).tolist()
CodePudding user response:
Assuming "col" the column, you can split
with a lookbehind regex:
df['col'].str.split(r'(?<=^.{4})', expand=True)
regex:
(?<=^.{4}) # match the empty space preceded by the first 4 characters
Or use str.extract
:
df['col'].str.extract('(^.{4})(.*)')
# df[['col2', 'col3']] = df['col'].str.extract('(^.{4})(.*)')
Full example:
df[['col2', 'col3']] = df['col'].str.split('(?<=^.{4})', expand=True)
output:
col col2 col3
0 000012345 0000 12345
1 000012345 0000 12345
2 000012345 0000 12345
CodePudding user response:
Check below code using string replace
df = pd.DataFrame({'col1':['000012345','000012345','000012345']})
df['col2'] = df['col1'].astype(int)
df['col3'] = df.apply(lambda row: row['col1'].replace(str(row['col2']),''), axis =1)
print(df)
Output:
col1 col2 col3
0 000012345 12345 0000
1 000012345 12345 0000
2 000012345 12345 0000