How to speed up list appending from dataframe.loc operation-CodePudding

I am trying to get the index of first occurance of each unique component names and then appending the indices to the list.

I have dataframe with around 20k rows.

 mylist=[]

 for i in df['name']:
        mylist.append(df.loc[df.name==i].index[0])
 mylist=set(mylist)

How can I speed up the above process? It takes around a minute to append to the list from dataframe.

CodePudding user response：

Can't test without a data example, but here it is:

df.reset_index().groupby('name').first()['index'].to_list()

A minimal reproducible example would look like this:

pd.DataFrame({'name': ['ABBA', 'LZ', 'LZ', 'LZ', 'IronMaiden', 'PinkFloyd', 'LZ', 'PinkFloyd']})

DataFrame:

         name
0        ABBA
1          LZ
2          LZ
3          LZ
4  IronMaiden
5   PinkFloyd
6          LZ
7   PinkFloyd

Desired outcome:

[0, 1, 4, 5]