Home > OS >  How to extract a subset of dataframe based on the last occurence of a string in a column in pandas?
How to extract a subset of dataframe based on the last occurence of a string in a column in pandas?

Time:12-01

I have a dataframe lik below

ID name number
1 Sta 2
1 Danny 5
1 Sta 2
1 elle 4
1 Sta 2
1 jake 9
1 Andy 11
1 Adam 22
1 blah 44
1 blahblah 66

I want to extract the records till the last occurrence of Sta. like this below

ID name number
1 Sta 2
1 Danny 5
1 Sta 2
1 elle 4
1 Sta 2

I am not sure how I can do that. Can someone please suggest?

CodePudding user response:

 first, *_, last = df.index[df.name.eq('Sta')]
df.loc[first:last]

   ID   name  number
0   1    Sta       2
1   1  Danny       5
2   1    Sta       2
3   1   elle       4
4   1    Sta       2

Get the first and last labels for Sta and index df

CodePudding user response:

I would find Sta's last occurrence and use it as the slicing parameter. There's an interesting function I didn't know that I found out working out this answer, last_valid_index() might be really helpful here.

filtered = df.loc[:df[df['name']=='Sta'].last_valid_index(),:]

CodePudding user response:

staRows = df.loc[df.name == "Sta", :]
lastStA = staRows.tail()
lastStaLoc = df.index[lastSta]
final = df.loc[0:lastStaLoc]

altFinal = df.loc[0:df.index[df.loc[df.name == "Sta", :].tail()]]

staRows is every row where name is Sta

lastSta is the last row in staRows

lastStaLoc is the location of lastSta

final should be rows up to lastStaLoc

altFinal is the one line solution if you want to be fancy. This is all untested so you may have to debug a bit :)

CodePudding user response:

Let us do groupby with transform idxmax

df[df.index<=df['name'].eq('Sta').iloc[::-1].groupby(df['ID']).transform('idxmax')]
   ID   name  number
0   1    Sta       2
1   1  Danny       5
2   1    Sta       2
3   1   elle       4
4   1    Sta       2
  • Related