I have a fixed array e.g. sort_by = [a,b,c,d,e,f]
. My dataframe looks like this, I have made Column1
my index:
Column1 | Column2 | ...
d 1
d 2
b 3
a 4
a 5
b 6
c 7
I want to loc from the sort_by
list to sort them, however, sometimes not all values of sort_by
are in Column
which results in index not found. How do I get it to "try" to the best of its ability?
s.set_index('mitre_attack_tactic', inplace=True)
print(s.loc[sort_by]) --> doesn't work
print(s.loc[[a,b,c,d]) --> does work however Column1 could have e,f,g
CodePudding user response:
You can use the key
of df.sort_values
. Idea is to create a value index dictionary from sort_by
list then map the dictionary to column and sort by the resulted index.
key = {v:k for k, v in enumerate(sort_by)}
df = df.sort_values('Column1', key=lambda col: col.map(key))
print(df)
Column1 Column2
3 a 4
4 a 5
2 b 3
5 b 6
6 c 7
0 d 1
1 d 2
CodePudding user response:
Let us try pd.Categorical
out = df.iloc[pd.Categorical(df.Column1,['a','b','c','d']).argsort()]
Out[48]:
Column1 Column2
3 a 4
4 a 5
2 b 3
5 b 6
6 c 7
0 d 1
1 d 2
CodePudding user response:
This page helps:
If you create your sort_by
as a categorical:
sort_by = pd.api.types.CategoricalDtype(["a","b","c","d","e","f"], ordered=True)
Then change your column to a categorical:
s['Column1'] = s['Column1'].astype(sort_by)
You can then sort it:
s.sort_values('Column1')
CodePudding user response:
index.intersection
df.loc[pd.Index(sort_by).intersection(df.index)]
Column2
a 4
a 5
b 3
b 6
c 7
d 1
d 2