Home > Back-end >  Obtaining the index of the first occurrence in a column in a dataframe
Obtaining the index of the first occurrence in a column in a dataframe

Time:01-22

I have the following dataframe:

import pandas as pd
d = {'Stages': ['Stage 1', 'Stage 2', 'Stage 2', 'Stage 2', 'Stage 3', 'Stage 1'], 'Start(s)': [0, 630, 780, 840, 900, 930], 'End(s)': [630, 780, 840, 900, 930, 960]}
df = pd.DataFrame(data=d)

    Stages         Start(s) End(s)
0   Stage 1          0      630
1   Stage 2         630     780
2   Stage 2         780     840
3   Stage 2         840     900
4   Stage 3         900     930
5   Stage 1         930     960

I would like to get the index where Stage 2 first appears in "Stages" column In this example, it would be 1.

I tried reading discussions on similar problems, but couldn't implement any.

CodePudding user response:

If always exist at least one Stage 2 use Series.idxmax with compare first value:

print (df['Stages'].eq('Stage 2').idxmax())
1

If possible not exist like Stage 8 use next with iter trick:

print (next(iter(df.index[df['Stages'].eq('Stage 8')]), 'not exist'))
not exist

print (next(iter(df.index[df['Stages'].eq('Stage 2')]), 'not exist'))
1

because if not exist matched value idxmax return first False value:

print (df['Stages'].eq('Stage 8').idxmax())
0

Another idea is test first index of not missing values is by Series.where and Series.first_valid_index:

print (df['Stages'].where(df['Stages'].eq('Stage 8')).first_valid_index())
None

print (df['Stages'].where(df['Stages'].eq('Stage 2')).first_valid_index())
1
  • Related