Home > Software engineering >  python: replace cell values in dataframe column with part of a string
python: replace cell values in dataframe column with part of a string

Time:10-12

given a column of strings in a dataframe, the following code transforms them into integers. What I want to do is to just leave the string part without the dot, and whenever the cell contains a number in string form, I would like to change it to a string called 'number'. Just to be clear, the cells in this column have the following values:

'a. 12','b. 75','23', 'c/a 34', '85', 'a 32', 'b 345'

and I want to replace the cell values in this column with the following:

'a', 'b', 'number', 'c/a', 'number', 'a' , 'b' 

How do I do that?

l2=['a. 12','b. 75','23', 'c/a 34', '85', 'a 32', 'b 345']
d = {'col1': []}
df = pd.DataFrame(data=d)
df['col1']=l2

df['col1'] = df['col1'].str.replace(r'\D', '').astype(str)
print(df)

CodePudding user response:

According to your example which seems to be (1) change numbers only to 'number' and (2) remove trailing dot/space/numbers:

df['col1'] = df['col1'].str.replace(r'^[\d\s] $', 'number', regex=True).str.replace('\.?\s*\d*$', '')

output:

     col1
0       a
1       b
2  number
3     c/a
4  number
5       a
6       b

CodePudding user response:

Another way using np.where with pd.Series.str.isnumeric and extract:

df["new"] = np.where(df["col1"].str.isnumeric(), "number", df["col1"].str.extract("^([a-z/]*)", expand=False))

print (df)

     col1     new
0   a. 12       a
1   b. 75       b
2      23  number
3  c/a 34     c/a
4      85  number
5    a 32       a
6   b 345       b
  • Related