Python: Slice a column string in Pandas Dataframe-CodePudding

I am trying to modify a column string. I want to remove unwanted numbers in the column and save the modified value as a new column.

Here's an example in SQL:

Using cast(substring(EMP_NM,0,CHARINDEX(' ',EMP_NM))as int) I get the below result.

How can I do this in python where I only get the "4253332" part of the numbers in a new column?

df.['EMP_NM'] = df['EMP_NM'].str.slice(0, 9) -- This does not give the result I want as some values in the column can be defined as below:

009201135 0000000000 0000000000 0000000000 0000000000

0006892203 0000000000 0000000000 0000000000 0000000000

Any help would be appreciated.

CodePudding user response：

How about this:

df['EMP_NM'] = df['EMP_NM'].str.replace('0','')

Or in case you have also something like this 009201135 0000000000 32331 0000000000 0000000000 as possible number. This means, something different from zero also in the other part of the code, this should works better:

df['EMP_fNM'] = df['EMP_NM'].str.split()[0][0].strip('0')

CodePudding user response：

Try this:


df['EMP_NM'] = df['EMP_NM'].astype(str).str[0:7]

If this data field is read into python as an integer the leading '0's will be removed automatically. Therefore you can just index to keep only the first 7 characters.

.astype(str) is used to consider the field as a string for indexing. You can only index strings and not integers or floats.
.str[0:7] is used to index the string keeping 7 values.