Home > Enterprise >  How do we get an optimum key value pair from a list of dictionaries in a dataframe column based on c
How do we get an optimum key value pair from a list of dictionaries in a dataframe column based on c

Time:09-23

I have the following dataframe:

enter image description here

Different 'type' can occur at the same 'time', but the need is to only get the 'type' and 'value' based on the following conditions:

  • priority 1: the type importance is so as t>o>f
  • priority 2: highest value to be considered from value column

I have tried using groupby and dictionary with:

grp = merged_df.groupby(['name','time'],as_index=False)[['type','value']].apply(lambda x: dict(x.values.tolist()))

This gives such an output:

enter image description here

Is there any way to get the optimum key value pair based on the above two rules so that the output can be:

enter image description here

The above is one method I tried and thats why the question is about optimum key value pair from a dictionary. However, any other more elegant method would be also fine.

The end result should be:

enter image description here

CodePudding user response:

The following answer might not be that elegant since: 1. it need you to create a dictionary beforehand, where you map priority 1 (t > o > f) and 2. bypasses working with dictionaries but should get the job done:

First, create a dictionary according to priority 1 and use it to create a new mapped column based on 'type':

prio_dic = {'t':'3', 'o':'2', 'f':'1'}
data["coded_type"] = data["type"].map(prio_dic)

Finally, sort the values according first using new column 'coded_type' and second with 'value', group by 'name' and 'time' as you did and get the first element for each group:

res = data.sort_values(['coded_type', 'value', ], ascending=False).groupby(['name', 'time'], as_index=False).first()

Unfortunately, this recquires the extra step of deleting the new column:

res.drop('coded_type', axis=1)
  • Related