Home > Blockchain >  Loosing negative sign when extracting data from a dataframe
Loosing negative sign when extracting data from a dataframe

Time:02-03

I extract temperature from a website into a dataframe. It looks like this:

           Temp      Prec
0     3 / -4 °C         -
1    1 / -17 °C         -
2   -7 / -18 °C         -
3     6 / -8 °C         -
4      8 / 1 °C         -
5      8 / 0 °C   1.3  mm
6      8 / 0 °C   7.0  mm
7     6 / -1 °C         -
8      4 / 0 °C   4.0  mm
9      5 / 2 °C  23.8  mm
10     6 / 1 °C         -
11    5 / -1 °C         -
12    4 / -1 °C         -
13     7 / 0 °C  10.6  mm
14     7 / 1 °C  29.7  mm

Then I use this code to extract the temperature in the format I want:

df2['Temp'] = df2['Temp'].str.extract('(\d )')   'C'

and I get this result:

   Temp      Prec
0    3C         -
1    1C         -
2    7C         -
3    6C         -
4    8C         -
5    8C   1.3  mm
6    8C   7.0  mm
7    6C         -
8    4C   4.0  mm
9    5C  23.8  mm
10   6C         -
11   5C         -
12   4C         -
13   7C  10.6  mm
14   7C  29.7  mm

I have lost the negative sign (like on row 2) when it's a temperature below zero. How can I keep the negative sign?

CodePudding user response:

Without regex, go for rsplit and slice with str :

df["Temp"] = df["Temp"].str.rsplit("/", n=1).str[-1]

And regarding the regex approach, you can include °C in the captured group :

df["Temp"] = df["Temp"].str.extract("(-?\d \s*°C)", expand=False)

​ Output :

print(df)

      Temp     Prec
0    -4 °C        -
1   -17 °C        -
2   -18 °C        -
3    -8 °C        -
4     1 °C        -
5     0 °C   1.3 mm
6     0 °C   7.0 mm
8     0 °C   4.0 mm
9     2 °C  23.8 mm
10    1 °C        -
11   -1 °C        -
12   -1 °C        -
13    0 °C  10.6 mm
14    1 °C  29.7 mm
  • Related