Consider dataframe records from particular records to particular records in python-CodePudding

by considering existing datafarame records, i want to get the records from particular hp source records to before particular hp source records like below mentioned output result.

Here is the example df:

seq_id       file_name      source       date

b21345350    a.txt          ad          2022-04-15
b32145660    e.txt          qe          2022-04-15
c43526890    ace.txt        hp          2022-04-15
re2345566    wer.csv        hp          2022-04-15
b43251044    op.xlsx        fa          2022-04-15
b6512400     ip.csv         jm          2022-04-15
b9123420     tb.xlsx        tp          2022-04-15
b3214563     cv.txt         ux          2022-04-14
b45678900    em.txt         hp          2022-04-14
b65357023    rt.csv         hp          2022-04-14
b90879081    ty.txt         mp          2022-04-14
b19019019    sd.txt         jp          2022-04-14

But i want to create the result dataframe from hp source to before hp source records like below output result :

seq_id        file_name        source        date

c43526890    ace.txt        hp          2022-04-15
re2345566    wer.csv        hp          2022-04-15
b43251044    op.xlsx        fa          2022-04-15
b6512400     ip.csv         jm          2022-04-15
b9123420     tb.xlsx        tp          2022-04-15
b3214563     cv.txt         ux          2022-04-14

can anyone help me to get the above result dataframe.

CodePudding user response：

Try this:

s = df['source'].eq('hp')
g = (s.ne(s.shift()) & s).cumsum()
d = {i:j for i,j in df.loc[g.ne(0)].groupby(g)}

After you have created the dictionary, you can pull the first group by using the code below:

d.get(1)

Output:

      seq_id file_name source        date
2  c43526890   ace.txt     hp  2022-04-15
3  re2345566   wer.csv     hp  2022-04-15
4  b43251044   op.xlsx     fa  2022-04-15
5   b6512400    ip.csv     jm  2022-04-15
6   b9123420   tb.xlsx     tp  2022-04-15
7   b3214563    cv.txt     ux  2022-04-14