How calculate difference between one column value and list of jsons values in other column, and crea-CodePudding

I have two columns in a Pandas Dataframe:

value_1 values_2
10      [{'name': 'eric', 'count':20}, {'name': 'john', 'count':30}]
20      [{'name': 'james', 'count':20}, {'name': 'jamie', 'count':35}]

I would like to create a function that creates 2 columns in the same dataframe. Here is what I try to have in the function:

Calculates differences in each row between 'count' key values in column values_2 and value in column 'value_1'.
In the new columns keep only the lowest name difference, and in the other the lowest difference value. example for row 1: for eric 20 - 10 and for john 30 - 10 The lowest is for eric, I would like to give his name as a value in a new column and the lowest difference value in another.

Expected output:

value_1   values_2                                                       lowest  lowest_difference
    10    [{'name': 'eric', 'count':20}, {'name': 'john', 'count':30}]     eric    10 (i.e. 20 -10)
    20    [{'name': 'james', 'count':20}, {'name': 'jamie', 'count':35}]   james   0 (i.e. 20-20)

I know I can do "apply" and/or "for loops" but don't know how do it in an elegant way. How can I do it ?

CodePudding user response：

You may check with explode , then we just need to find the min diff within each group by idxmin

s = df.explode('values_2').reset_index()
idx = s.values_2.str.get('count').sub(s['value_1']).groupby(s['index']).idxmin()
df['new'] = s.values_2.str.get('name').loc[idx].values
df
Out[594]: 
                                            values_2  value_1    new
0  [{'name': 'eric', 'count': 20}, {'name': 'john...       10   eric
1  [{'name': 'james', 'count': 20}, {'name': 'jam...       20  james

CodePudding user response：

You can use a custom function:

def get_min(r):
    l = r['values_2']
    MIN = min(range(len(l)), key=lambda d: l[d]['count']-r['value_1'])
    return pd.Series({'lowest': l[MIN]['name'], 'lowest_difference': l[MIN]['count']-r['value_1']})
    
df.join(df.apply(get_min, axis=1))

output:

   value_1                                                          values_2 lowest  lowest_difference
0       10    [{'name': 'eric', 'count': 20}, {'name': 'john', 'count': 30}]   eric                 10
1       20  [{'name': 'james', 'count': 20}, {'name': 'jamie', 'count': 35}]  james                  0