I extract temperature from a website into a dataframe. It looks like this:
Temp Prec
0 3 / -4 °C -
1 1 / -17 °C -
2 -7 / -18 °C -
3 6 / -8 °C -
4 8 / 1 °C -
5 8 / 0 °C 1.3 mm
6 8 / 0 °C 7.0 mm
7 6 / -1 °C -
8 4 / 0 °C 4.0 mm
9 5 / 2 °C 23.8 mm
10 6 / 1 °C -
11 5 / -1 °C -
12 4 / -1 °C -
13 7 / 0 °C 10.6 mm
14 7 / 1 °C 29.7 mm
Then I use this code to extract the temperature in the format I want:
df2['Temp'] = df2['Temp'].str.extract('(\d )') 'C'
and I get this result:
Temp Prec
0 3C -
1 1C -
2 7C -
3 6C -
4 8C -
5 8C 1.3 mm
6 8C 7.0 mm
7 6C -
8 4C 4.0 mm
9 5C 23.8 mm
10 6C -
11 5C -
12 4C -
13 7C 10.6 mm
14 7C 29.7 mm
I have lost the negative sign (like on row 2) when it's a temperature below zero. How can I keep the negative sign?
CodePudding user response:
Without regex, go for rsplit
and slice with str
:
df["Temp"] = df["Temp"].str.rsplit("/", n=1).str[-1]
And regarding the regex approach, you can include °C
in the captured group :
df["Temp"] = df["Temp"].str.extract("(-?\d \s*°C)", expand=False)
Output :
print(df)
Temp Prec
0 -4 °C -
1 -17 °C -
2 -18 °C -
3 -8 °C -
4 1 °C -
5 0 °C 1.3 mm
6 0 °C 7.0 mm
8 0 °C 4.0 mm
9 2 °C 23.8 mm
10 1 °C -
11 -1 °C -
12 -1 °C -
13 0 °C 10.6 mm
14 1 °C 29.7 mm