Input
Column
0 2 mm
1 3 kg
2 4 m
3 name
4 2 mm
5 3 mph
6 full
7 left
Here I need to remove Units from the table. I tried with
df["Column"] = df["Column"].replace("\D", "", regex = True)
It gives me wrong output.
Expected Output:
Column
0 2
1 3
2 4
3 name
4 2
5 3
6 full
7 left
CodePudding user response:
You can use
df["Column"] = df["Column"].str.replace(r'(\d)\s*[a-zA-Z] $', r'\1', regex=True)
See the regex demo. Regex details:
(\d)
- Group 1 (the\1
numbered backreference refers to this group value from the replacement pattern): any digit\s*
- zero or more whitespaces[a-zA-Z]
- one or more ASCII letters$
- end of string.
CodePudding user response:
You still can use your replace
s = df.Column.replace('[^0-9] ','',regex=True)
df.Column = df.Column.mask(s!='',s)
Out[27]:
0 2
1 3
2 4
3 name
4 2
5 3
6 full
7 left
Name: Column, dtype: object
CodePudding user response:
You can use str.extract
: if the row begins by a number ^\d
, get it or |
keep the entire row .*
.
df['Column'] = df['Column'].str.extract(r'(^\d |.*)')
print(df)
# Output
Column
0 2
1 3
2 4
3 name
4 2
5 3
6 full
7 left