I am trying to extract max value and respective key from multiple dictionary key & values inside an object column in python dataframe, can anyone help me out
df:
Obj_values
0 {"Sony":25,"Max":91,"James":55}
1 {"Jack":12,"Max":10}
2 {}
3
4 {"Halk":21,"Hall":15}
5 {"H":1}
6 {"Aws":"k","Az":113,"Gc":"b"}
7 {"Max":60,"HBO":113,"Sony":55, "WS":1256,"James":55}
I have tried something like below, but still not able to convert the dataframe to dict:
obj_lst = df["Obj_values"].tolist()
df["Obj_values"] = ast.literal_eval(str(obj_lst))
df["Obj_values"].dtype
# dtype('O')
for k,v in df["Obj_values"].items():
print(k,v)
CodePudding user response:
Probably a bit slow because your data are messy:
- empty row
- empty dict
- string values
Try:
def f_max(d):
k = v = ''
if d:
d = {k: v for k, v in d.items() if isinstance(v, (int, float))}
if d:
k = max(d, key=d.get)
v = d[k]
return pd.Series({'Name': k, 'Value': v})
df = pd.concat([df, df['Obj_values'].apply(f_max)], axis=1)
Output:
>>> df
Obj_values Name Value
0 {'Sony': 25, 'Max': 91, 'James': 55} Max 91
1 {'Jack': 12, 'Max': 10} Jack 12
2 {}
4
5 {'Halk': 21, 'Hall': 15} Halk 21
6 {'H': 1} H 1
7 {'Aws': 'k', 'Az': 113, 'Gc': 'b'} Az 113
8 {'Max': 60, 'HBO': 113, 'Sony': 55, 'WS': 1256... WS 1256
CodePudding user response:
You have the right idea parsing the dictionary using ast.literal_eval
. There are various ways you could work with the dataframe, but once you have got your list of dict
objects in obj_list
it may be simpler to loop over the list to get the names and values for each row and append them as dataframe columns at the end. That's the approach I've gone for here.
dict.items
gives you a list of (key, value) tuples, so you need need to find the maximum of these with the value as the key. This has been discussed in the following threads:
How to find the maximum value in a list of tuples?
Finding max value in the second column of a nested list?
There are various ways to do it but I have gone with using a lambda function for the key parameter (name, value = max(obj_as_dict.items(), key=lambda x: x[1])
).
Full code below. I just took the first two rows of your dataframe to test it out.
import pandas as pd
import ast
df = pd.DataFrame(['{"Sony": 25,"Max":91,"James":55}', '{"Jack":12,"Max":10}'])
df.columns = ["Obj_values"]
obj_list = df['Obj_values'].tolist()
names = []
values = []
for obj in obj_list:
obj_as_dict = ast.literal_eval(str(obj))
name, value = max(obj_as_dict.items(), key=lambda x: x[1])
names.append(name)
values.append(value)
df['Name'] = names
df['Value'] = values
print(df)
Output:
Obj_values Name Value
0 {"Sony": 25,"Max":91,"James":55} Max 91
1 {"Jack":12,"Max":10} Jack 12