I have below DataFrame & list of data
data = [['tom', 10], ['nick', 15], ['juli', 14],
['test',14], ['test1',12],['test1',14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
>>> df
Name Age
0 tom 10
1 nick 15
2 juli 14
3 test 14
4 test1 12
5 test1 14
index_list=[['test1','juli'],['nick'],['tom','test']]
>>> index_list
[['test1', 'juli'], ['nick'], ['tom', 'test']]
I would like to add a column cluster_id
to DataFrame based on index of Name
in the list, so output should be like
>>> df
Name Age cluster_id
0 tom 10 2
1 nick 15 1
2 juli 14 0
3 test 14 2
4 test1 12 0
5 test1 14 0
CodePudding user response:
You could convert index_list
to a dictionary that maps names to cluster ids using a dict comprehension and map
it to "Name" column:
index_dic = {name: i for i, sublist in enumerate(index_list) for name in sublist}
df['cluster_id'] = df['Name'].map(index_dic)
Output:
Name Age cluster_id
0 tom 10 2
1 nick 15 1
2 juli 14 0
3 test 14 2
4 test1 12 0
5 test1 14 0