I am trying to get the index of first occurance of each unique component names and then appending the indices to the list.
I have dataframe with around 20k rows.
mylist=[]
for i in df['name']:
mylist.append(df.loc[df.name==i].index[0])
mylist=set(mylist)
How can I speed up the above process? It takes around a minute to append to the list from dataframe.
CodePudding user response:
Can't test without a data example, but here it is:
df.reset_index().groupby('name').first()['index'].to_list()
A minimal reproducible example would look like this:
pd.DataFrame({'name': ['ABBA', 'LZ', 'LZ', 'LZ', 'IronMaiden', 'PinkFloyd', 'LZ', 'PinkFloyd']})
DataFrame:
name
0 ABBA
1 LZ
2 LZ
3 LZ
4 IronMaiden
5 PinkFloyd
6 LZ
7 PinkFloyd
Desired outcome:
[0, 1, 4, 5]