I have a couple DFs of varying length and row number. Such as this:
df =
c1 c2 c3 c4
r1 8 4 5 2
r2 2 5 9 2
I need to loop through each column and return the row with the highest value and add that as a key to a dictionary with the column name as the value. So for c1 I would get r1 as key and c1 as value and for c2 I would get r2 as key and c2 as value. Then for c3 I would get r2 as key and now [c2,c3] as key. Furthermore, for c4 I need all keys to get c4 added to their list as values.
The dataframe I'm working on has many more columns and many more rows.
The final dictionary should look like this:
dict = {"r1": ["c1","c4"], "r2": ["c2","c3","c4"]}
This seemed at first not that hard, but I am quite stuck.
The most difficult problem I have is to keep the lists/values separate from each key.
Any help would be greatly appreciated!
CodePudding user response:
IIUC, you could compare each value with the max
per column, then keep only the max values and reshape to dictionary:
s = df.eq(df.max()).stack()
s[s].reset_index(level=1).groupby(level=0)['level_1'].agg(list).to_dict()
output: {'r1': ['c1', 'c4'], 'r2': ['c2', 'c3', 'c4']}
Alternative approach taking advantage of the fact that groupby
on an Index returns a dict:
s = df.eq(df.max()).stack()
s = s[s].reset_index(level=0)['level_0']
out = s.index.groupby(s)
# {'r1': ['c1', 'c4'], 'r2': ['c2', 'c3', 'c4']}
CodePudding user response:
Let us try with rank
d = df.rank(ascending=False,method = 'dense').apply(lambda x: x.index[x==1].tolist(),axis=1).to_dict()
Out[52]: {'r1': ['c1', 'c4'], 'r2': ['c2', 'c3', 'c4']}