I have the following df
Trends Value
2021-12-13T08:00:00.000Z 45
2021-12-13T07:00:00.000Z 32
2021-12-13T06:42:10.000Z 23
2021-12-13T06:27:00.000Z 45
2021-12-10T05:00:00.000Z 23
I ran the following line:
df['Trends'].str.extract('^(.*:[1-9][1-9].*)$', expand=True)
It returns:
0
NaN
NaN
2021-12-13T06:42:10.000Z
2021-12-13T06:27:00.000Z
NaN
My objective is to use the regex, extract any trends that have minutes and seconds more than zero. The regex works (tested) and the line also work, but what I don't understand is why is it returning NaN
when it does not match? I looked through several other SO and the line is pretty much the same.
My expected outcome:
2021-12-13T06:42:10.000Z
2021-12-13T06:27:00.000Z
CodePudding user response:
Your solution is close; you can get matches with str.match, then filter:
df[df.Trends.str.match('^(.*:[1-9][1-9].*)$')].Trends
output:
2 2021-12-13T06:42:10.000Z
3 2021-12-13T06:27:00.000Z
CodePudding user response:
previous answer won't work with the following data (where minute is 00 but second is not, or vice versa), but will work with this updated regex.
df[df.Trends.str.match('^(?!.*:00:00\..*)(.*:[0-9] :[0-9] \..*)$')].Trends
or
df[df.Trends.str.match('^(?!.*:00:00\..*)(.*:.*\..*)$')].Trends
or if second doesn't matter, but 01 minute should be selected then
df[df.Trends.str.match('^(?!.*:00:\d \..*)(.*:.*\..*)$')].Trends
Trends Value
2021-12-13T07:00:00.000Z 32
2021-12-13T07:00:01.000Z 32
2021-12-13T07:00:10.000Z 32
2021-12-13T07:01:00.000Z 32
2021-12-13T07:10:00.000Z 32