let's suppose that my dataset looks like this:
ID | School Name | School Type | Class size | Specialization |
---|---|---|---|---|
0 | New Jersey State School | State School | 7000 | General |
1 | American International School | Private International School | 200 | STEM |
2 | Boston Arts High School | Private National School | 500 | Arts |
3 | New Hamptons Academy | Private National School | 300 | Humanities |
4 | Colorado State School | State School | 10000 | General |
... | ... | ... | ... | ... |
I have some key values being defined:
school_type_key={'State School':0, 'Private National School':1, 'Private International School':2}
specialization_key={'STEM':0,'Humanities':1,'Arts':2, 'Life Sciences':3, 'General':4, 'Other':5}
I need to write a function that:
- takes a column
- makes a list of unique value
- sorts this list based on defined keys
- returns a list
Here is what I tried:
def sort_by_keys(column):
unique_values=df[column].unique()
unique_values=list(unique_values)
key=column.replace(' ','_').lower() '_key'
#I need key so that if i pass 'School Type' column, it is sorted by 'school_type_key'
unique_values.sort(key=key.get, reverse=True)
return unique_values
sort_by_keys('School Type')
However, it returns an error because key is a string and 'get' can't be applied towards a string. How can I solve this issue?
Expected output:
output=sort_by_keys('School Type')
print(output)
['State School', 'Private National School', 'Private International School']
output=sort_by_keys('Specialization')
print(output)
['STEM', 'Humanities','Arts','General']
CodePudding user response:
You can use:
MAPPING = {
'School Type': school_type_key,
'Specialization': specialization_key
}
def sort_by_keys(col):
return (df[col].sort_values(key=lambda x: x.map(MAPPING[col]))
.unique().tolist())
Output:
>>> sort_by_keys('School Type')
['State School', 'Private National School', 'Private International School']
>>> sort_by_keys('Specialization')
['STEM', 'Humanities', 'Arts', 'General']