Home > Back-end >  Add character to column based on text condition using pandas
Add character to column based on text condition using pandas

Time:11-07

I'm trying to do some data cleaning using pandas. Imagine I have a data frame which has a column call "Number" and contains data like: "1203.10", "4221","3452.11", etc. I want to add an "M" before the numbers, which have a point and a zero at the end. For this example, it would be turning the "1203.10" into "M1203.10".

I know how to obtain a data frame containing the numbers with a point and ending with zero.

Suppose the data frame is call "df".

pointzero = '[0-9] [.][0-9] [0]$'
pz = df[df.Number.str.match(pointzero)]

But I'm not sure on how to add the "M" at the beginning after having "pz". The only way I know is using a for loop, but I think there is a better way. Any suggestions would be great!

CodePudding user response:

You can use boolean indexing:

pointzero = '[0-9] [.][0-9] [0]$'
m = df.Number.str.match(pointzero)

df.loc[m, 'Number'] = 'M'   df.loc[m, 'Number']

Alternatively, using str.replace and a slightly different regex:

pointzero = '([0-9] [.][0-9] [0]$)'
df['Number'] = df['Number'].str.replace(pointzero, r'M\1', regex=True))

Example:

     Number
0  M1203.10
1      4221
2   3452.11

CodePudding user response:

you should make dataframe or seires example for answer

example:

s1 = pd.Series(["1203.10", "4221","3452.11"])
s1

0    M1203.10
1        4221
2     3452.11
dtype: object

str.contains boolean masking

cond1 = s1.str.contains('[0-9] [.][0-9] [0]$')
s1.mask(cond1, 'M' s1)

output:

0    M1203.10
1        4221
2     3452.11
dtype: object
  • Related