Home > Back-end >  Pandas str.extract returning NaN
Pandas str.extract returning NaN

Time:03-15

I have the following df

Trends                       Value   
2021-12-13T08:00:00.000Z       45
2021-12-13T07:00:00.000Z       32
2021-12-13T06:42:10.000Z       23
2021-12-13T06:27:00.000Z       45
2021-12-10T05:00:00.000Z       23

I ran the following line:

df['Trends'].str.extract('^(.*:[1-9][1-9].*)$', expand=True)

It returns:

0
NaN  
NaN   
2021-12-13T06:42:10.000Z  
2021-12-13T06:27:00.000Z   
NaN     

My objective is to use the regex, extract any trends that have minutes and seconds more than zero. The regex works (tested) and the line also work, but what I don't understand is why is it returning NaN when it does not match? I looked through several other SO and the line is pretty much the same.

My expected outcome:

2021-12-13T06:42:10.000Z  
2021-12-13T06:27:00.000Z   

CodePudding user response:

Your solution is close; you can get matches with str.match, then filter:

df[df.Trends.str.match('^(.*:[1-9][1-9].*)$')].Trends

output:

2    2021-12-13T06:42:10.000Z
3    2021-12-13T06:27:00.000Z

CodePudding user response:

previous answer won't work with the following data (where minute is 00 but second is not, or vice versa), but will work with this updated regex.

df[df.Trends.str.match('^(?!.*:00:00\..*)(.*:[0-9] :[0-9] \..*)$')].Trends

or

df[df.Trends.str.match('^(?!.*:00:00\..*)(.*:.*\..*)$')].Trends

or if second doesn't matter, but 01 minute should be selected then

df[df.Trends.str.match('^(?!.*:00:\d \..*)(.*:.*\..*)$')].Trends

Trends                       Value   
2021-12-13T07:00:00.000Z       32
2021-12-13T07:00:01.000Z       32
2021-12-13T07:00:10.000Z       32
2021-12-13T07:01:00.000Z       32
2021-12-13T07:10:00.000Z       32
  • Related