given a column of strings in a dataframe, the following code transforms them into integers. What I want to do is to just leave the string part without the dot, and whenever the cell contains a number in string form, I would like to change it to a string called 'number'. Just to be clear, the cells in this column have the following values:
'a. 12','b. 75','23', 'c/a 34', '85', 'a 32', 'b 345'
and I want to replace the cell values in this column with the following:
'a', 'b', 'number', 'c/a', 'number', 'a' , 'b'
How do I do that?
l2=['a. 12','b. 75','23', 'c/a 34', '85', 'a 32', 'b 345']
d = {'col1': []}
df = pd.DataFrame(data=d)
df['col1']=l2
df['col1'] = df['col1'].str.replace(r'\D', '').astype(str)
print(df)
CodePudding user response:
According to your example which seems to be (1) change numbers only to 'number' and (2) remove trailing dot/space/numbers:
df['col1'] = df['col1'].str.replace(r'^[\d\s] $', 'number', regex=True).str.replace('\.?\s*\d*$', '')
output:
col1
0 a
1 b
2 number
3 c/a
4 number
5 a
6 b
CodePudding user response:
Another way using np.where
with pd.Series.str.isnumeric
and extract
:
df["new"] = np.where(df["col1"].str.isnumeric(), "number", df["col1"].str.extract("^([a-z/]*)", expand=False))
print (df)
col1 new
0 a. 12 a
1 b. 75 b
2 23 number
3 c/a 34 c/a
4 85 number
5 a 32 a
6 b 345 b