What would be the most efficient way to create a list of the labels from the dataframe below in the order of mylist?
import numpy as np
import pandas as pd
mylist = ['a1.jpeg','a2.jpeg','b1.jpeg','b2.jpeg','c1.jpeg','c2.jpeg']
dat = np.array([(1, 2, 1, 1, 2, 2), ('a2jpeg', 'a1jpeg', 'c2jpeg', 'b2jpeg', 'b1jpeg' , 'c1jpeg')])
df = pd.DataFrame(np.transpose(dat), columns=['labels', 'filenames'])
df
>>labels filenames
0 1 a2.jpeg
1 2 a1.jpeg
2 1 c2.jpeg
3 1 b2.jpeg
4 2 b1.jpeg
5 2 c1.jpeg
CodePudding user response:
Just use sort_values
:
>>> df.sort_values('filenames')
labels filenames
1 2 a1jpeg
0 1 a2jpeg
4 2 b1jpeg
3 1 b2jpeg
5 2 c1jpeg
2 1 c2jpeg
>>>
To convert to list:
>>> df['filenames'].sort_values().tolist()
['a1jpeg', 'a2jpeg', 'b1jpeg', 'b2jpeg', 'c1jpeg', 'c2jpeg']
>>>
CodePudding user response:
Use Series.replace
first and then change order by DataFrame.set_index
with Series.reindex
:
L = (df.assign(filenames = df['filenames'].replace('jpeg','.jpeg', regex=True))
.set_index('filenames')['labels']
.reindex(mylist)
.tolist())
print (L)
['2', '1', '2', '1', '2', '1']
Or:
df['filenames'] = pd.Categorical(df['filenames'].replace('jpeg','.jpeg', regex=True),
ordered=True,
categories=mylist)
L = df.sort_values(by='filenames')['labels'].tolist()
print (L)
['2', '1', '2', '1', '2', '1']
If possible simplify solution by sorting in DataFrame.sort_values
:
L = df.sort_values(by='filenames')['labels'].tolist()
print (L)
['2', '1', '2', '1', '2', '1']