I have the following pandas dataframe df
:
timestamp col1
2021-01-11 11:00 0
2021-01-11 12:00 0
2021-01-11 13:00 1
2021-01-11 14:00 1
2021-01-11 15:00 0
I need to get a timestamp
of the first row when col1
is equal to 1. The expected answer is 2021-01-11 13:00
.
This is my current solution:
first = None
for index,row in df.iterrows():
if row["col1"] == 1:
if not first:
first = row["timestamp"]
break
How can I simplify it and make it faster?
CodePudding user response:
Solutions if match at least one value:
If there is only 0
and 1
values use Series.idxmax
:
out = df.loc[df['col1'].idxmax(),'timestamp']
Or if possible another values like 0, 1
compare by 1
:
out = df.loc[df['col1'].eq(1).idxmax(),'timestamp']
Or create DatetimeIndex
first:
out = df.set_index('timestamp')['col1'].idxmax()
print (out)
2021-01-11 13:00:00
Solution for any values - if no match idxmax
return first value, so possible solutions:
print (df)
timestamp col1
0 2021-01-11 11:00:00 0
1 2021-01-11 12:00:00 0
2 2021-01-11 13:00:00 0
3 2021-01-11 14:00:00 0
4 2021-01-11 15:00:00 0
out = df.set_index('timestamp')['col1'].eq(1).idxmax()
print (out)
2021-01-11 11:00:00
s = df.set_index('timestamp')['col1'].eq(1)
out = s.idxmax() if s.any() else None
print (out)
None
CodePudding user response:
Use idxmax
to get the index of the first 1
among the 0/1s:
df.loc[df['col1'].idxmax(), 'timestamp']
Or to get the first 1
in case there are other values:
df.loc[df['col1'].eq(1), 'timestamp'].iat[0]
output: '2021-01-11 13:00'
Ensuring that a value exists:
s = df.loc[df['col1'].eq(1), 'timestamp']
s.iat[0] if len(s) else 'no value'
or as one liner:
s.iat[0] if len(s:=df.loc[df['col1'].eq(1), 'timestamp']) else 'no value'
output if no 1: no value
CodePudding user response:
You can use loc
df.loc[df.col1 == 1]
this will return all row where the condition is met
df.loc[df.col1 == 1].timestamp.values[0]
will return just the value in the timestamp column for the first row that satisfies the condition