by considering existing datafarame records, i want to get the records from particular hp source records to before particular hp source records like below mentioned output result.
Here is the example df:
seq_id file_name source date
b21345350 a.txt ad 2022-04-15
b32145660 e.txt qe 2022-04-15
c43526890 ace.txt hp 2022-04-15
re2345566 wer.csv hp 2022-04-15
b43251044 op.xlsx fa 2022-04-15
b6512400 ip.csv jm 2022-04-15
b9123420 tb.xlsx tp 2022-04-15
b3214563 cv.txt ux 2022-04-14
b45678900 em.txt hp 2022-04-14
b65357023 rt.csv hp 2022-04-14
b90879081 ty.txt mp 2022-04-14
b19019019 sd.txt jp 2022-04-14
But i want to create the result dataframe from hp source to before hp source records like below output result :
seq_id file_name source date
c43526890 ace.txt hp 2022-04-15
re2345566 wer.csv hp 2022-04-15
b43251044 op.xlsx fa 2022-04-15
b6512400 ip.csv jm 2022-04-15
b9123420 tb.xlsx tp 2022-04-15
b3214563 cv.txt ux 2022-04-14
can anyone help me to get the above result dataframe.
CodePudding user response:
Try this:
s = df['source'].eq('hp')
g = (s.ne(s.shift()) & s).cumsum()
d = {i:j for i,j in df.loc[g.ne(0)].groupby(g)}
After you have created the dictionary, you can pull the first group by using the code below:
d.get(1)
Output:
seq_id file_name source date
2 c43526890 ace.txt hp 2022-04-15
3 re2345566 wer.csv hp 2022-04-15
4 b43251044 op.xlsx fa 2022-04-15
5 b6512400 ip.csv jm 2022-04-15
6 b9123420 tb.xlsx tp 2022-04-15
7 b3214563 cv.txt ux 2022-04-14