Home > Mobile >  How to find a row based on condition and return a column value of this row?
How to find a row based on condition and return a column value of this row?

Time:03-04

I have the following pandas dataframe df:

timestamp          col1
2021-01-11 11:00   0
2021-01-11 12:00   0
2021-01-11 13:00   1
2021-01-11 14:00   1
2021-01-11 15:00   0

I need to get a timestamp of the first row when col1 is equal to 1. The expected answer is 2021-01-11 13:00.

This is my current solution:

first = None
for index,row in df.iterrows():
    if row["col1"] == 1:
        if not first:
            first = row["timestamp"]
            break

How can I simplify it and make it faster?

CodePudding user response:

Solutions if match at least one value:

If there is only 0 and 1 values use Series.idxmax:

out = df.loc[df['col1'].idxmax(),'timestamp']

Or if possible another values like 0, 1 compare by 1:

out = df.loc[df['col1'].eq(1).idxmax(),'timestamp']

Or create DatetimeIndex first:

out = df.set_index('timestamp')['col1'].idxmax()

print (out)
2021-01-11 13:00:00

Solution for any values - if no match idxmax return first value, so possible solutions:

print (df)
            timestamp  col1
0 2021-01-11 11:00:00     0
1 2021-01-11 12:00:00     0
2 2021-01-11 13:00:00     0
3 2021-01-11 14:00:00     0
4 2021-01-11 15:00:00     0


out = df.set_index('timestamp')['col1'].eq(1).idxmax()
print (out)
2021-01-11 11:00:00

s = df.set_index('timestamp')['col1'].eq(1)
out = s.idxmax() if s.any() else None
print (out)
None

CodePudding user response:

Use idxmax to get the index of the first 1 among the 0/1s:

df.loc[df['col1'].idxmax(), 'timestamp']

Or to get the first 1 in case there are other values:

df.loc[df['col1'].eq(1), 'timestamp'].iat[0]

output: '2021-01-11 13:00'

Ensuring that a value exists:

s = df.loc[df['col1'].eq(1), 'timestamp']
s.iat[0] if len(s) else 'no value'

or as one liner:

s.iat[0] if len(s:=df.loc[df['col1'].eq(1), 'timestamp']) else 'no value'

output if no 1: no value

CodePudding user response:

You can use loc

df.loc[df.col1 == 1]

this will return all row where the condition is met

df.loc[df.col1 == 1].timestamp.values[0]

will return just the value in the timestamp column for the first row that satisfies the condition

  • Related