I'm trying to do some data cleaning using pandas. Imagine I have a data frame which has a column call "Number" and contains data like: "1203.10", "4221","3452.11", etc. I want to add an "M" before the numbers, which have a point and a zero at the end. For this example, it would be turning the "1203.10" into "M1203.10".
I know how to obtain a data frame containing the numbers with a point and ending with zero.
Suppose the data frame is call "df".
pointzero = '[0-9] [.][0-9] [0]$'
pz = df[df.Number.str.match(pointzero)]
But I'm not sure on how to add the "M" at the beginning after having "pz". The only way I know is using a for loop, but I think there is a better way. Any suggestions would be great!
CodePudding user response:
You can use boolean indexing:
pointzero = '[0-9] [.][0-9] [0]$'
m = df.Number.str.match(pointzero)
df.loc[m, 'Number'] = 'M' df.loc[m, 'Number']
Alternatively, using str.replace
and a slightly different regex:
pointzero = '([0-9] [.][0-9] [0]$)'
df['Number'] = df['Number'].str.replace(pointzero, r'M\1', regex=True))
Example:
Number
0 M1203.10
1 4221
2 3452.11
CodePudding user response:
you should make dataframe or seires example for answer
example:
s1 = pd.Series(["1203.10", "4221","3452.11"])
s1
0 M1203.10
1 4221
2 3452.11
dtype: object
str.contains
boolean masking
cond1 = s1.str.contains('[0-9] [.][0-9] [0]$')
s1.mask(cond1, 'M' s1)
output:
0 M1203.10
1 4221
2 3452.11
dtype: object