so I have a list
my_list = [1,1,2,3,4,4]
I have a dataframe that looks like this
col_1 col_2
a 1
b 1
c 2
d 3
e 3
f 4
g 4
h 4
I basically want a final dataframe like
col_1 col_2
a 1
b 1
c 2
d 3
f 4
g 4
Basically I cant use
my_df[my_df['col_2'].isin(my_list)]
since this will include all the rows. I want the first row that matches with each item on the list, but all the same count of rows.
CodePudding user response:
Use GroupBy.cumcount
for counter with original and helper DataFrame and filter by inner join in DataFrame.merge
:
my_list = [1,1,2,3,4,4]
df1 = pd.DataFrame({'col_2':my_list})
df1['g'] = df1.groupby('col_2').cumcount()
my_df['g'] = my_df.groupby('col_2').cumcount()
df = my_df.merge(df1).drop('g', axis=1)
print (df)
col_1 col_2
0 a 1
1 b 1
2 c 2
3 d 3
4 f 4
5 g 4