I have a column with below values. I would like to extract these numbers (with decimals).
13
12
13.5
420
0
9.75
007
7
11.27
.10
1776
...10
..11
If I am using df["x"] = df["x"].str.extract(r'([0-9] .*)'), the values which start with "." are being neglected from the result. Please note .10, ...10, ..11 should return 10 and 11 respectively
CodePudding user response:
It seems that you only need to strip
the dots and convert to float
:
df['x'].str.strip('.').astype(float)
Output:
0 13.00
1 12.00
2 13.50
3 420.00
4 0.00
5 9.75
6 7.00
7 7.00
8 11.27
9 10.00
10 1776.00
11 10.00
12 11.00
Name: x, dtype: float64
CodePudding user response:
To use your code, I just added strip('.')
to remove leading '.', and fixed regex to ([1-9] .*|0\Z)
so that it can return what you expected without leading zero, as follows:
import pandas as pd
df = pd.DataFrame({
'x': ['13', '12', '13.5', '420', '0', '9.75', '007', '7', '11.27', '.10', '1776', '...10', '..11']
})
df['x'] = df['x'].str.strip('.').str.extract('([1-9] .*|0\Z)')
print(df['x'])
Result:
0 13
1 12
2 13.5
3 420
4 0
5 9.75
6 7
7 7
8 11.27
9 10
10 1776
11 10
12 11
Name: x, dtype: object