Home > OS >  Retaining text based on phrases in a pandas dataframe & removing all other text
Retaining text based on phrases in a pandas dataframe & removing all other text

Time:12-07

I have a column in my dataframe that contains text like:

Sunny, with a high near 82. Light and variable wind becoming northwest 5 to 7 mph in the afternoon.

but sometimes contains text like:

A 50 percent chance of showers.  Partly sunny, with a high near 61.

I want to manipulate it so that the temperature value (i.e., the 82 or 61) is retained while all other information is removed. So it would become "82" or "61." I cannot do this on a fixed index since the length of the dataframe entry is variable, as is the number length since it is temperature.

I want to use phrases like "high near", "low near", etc to parse through the string to find the temperature value. Is there a pythonically pleasing way of accomplishing this?

CodePudding user response:

Try this:

df['temperature'] = df['text'].str.extract('(?:high|low) near (\d )')[0]

Output:

>>> df
                                                text temperature
0  Sunny, with a high near 82. Light and variable...          82
1  A 50 percent chance of showers.  Partly sunny,...          61

CodePudding user response:

You could use a regex with pandas like near (\d ) which shall find digits following near

  • Related