Home > database >  Splitting a number string in pandas / python
Splitting a number string in pandas / python

Time:09-12

I am looking to split the following column to two columns in my pandas dataframe by splitting on the last 0 in every row.

000012345
000012345
000012345

What I would like it to look like

0000 12345
0000 12345

I've been looking into str.split from can't seem to figure how to approach this as there is no usual delimiter, and I can not figure out how to make it split on the 4th 0.

I have had success with a similar issue previously with the following command, but can not seem to figure it out, as im not looking to split labels, but values in the rows.

df.labels.str.split(':',1).tolist()

CodePudding user response:

Assuming "col" the column, you can split with a lookbehind regex:

df['col'].str.split(r'(?<=^.{4})', expand=True)

regex:

(?<=^.{4})    # match the empty space preceded by the first 4 characters

Or use str.extract:

df['col'].str.extract('(^.{4})(.*)')
# df[['col2', 'col3']] = df['col'].str.extract('(^.{4})(.*)')

Full example:

df[['col2', 'col3']] = df['col'].str.split('(?<=^.{4})', expand=True)

output:

         col  col2   col3
0  000012345  0000  12345
1  000012345  0000  12345
2  000012345  0000  12345

CodePudding user response:

Check below code using string replace

df = pd.DataFrame({'col1':['000012345','000012345','000012345']})

df['col2'] =  df['col1'].astype(int)

df['col3'] = df.apply(lambda row: row['col1'].replace(str(row['col2']),''), axis =1)

print(df)

Output:

        col1   col2  col3
0  000012345  12345  0000
1  000012345  12345  0000
2  000012345  12345  0000
  • Related