Home > Enterprise >  How calculate difference between one column value and list of jsons values in other column, and crea
How calculate difference between one column value and list of jsons values in other column, and crea

Time:03-07

I have two columns in a Pandas Dataframe:

value_1 values_2
10      [{'name': 'eric', 'count':20}, {'name': 'john', 'count':30}]
20      [{'name': 'james', 'count':20}, {'name': 'jamie', 'count':35}]

I would like to create a function that creates 2 columns in the same dataframe. Here is what I try to have in the function:

  1. Calculates differences in each row between 'count' key values in column values_2 and value in column 'value_1'.
  2. In the new columns keep only the lowest name difference, and in the other the lowest difference value. example for row 1: for eric 20 - 10 and for john 30 - 10 The lowest is for eric, I would like to give his name as a value in a new column and the lowest difference value in another.

Expected output:

value_1   values_2                                                       lowest  lowest_difference
    10    [{'name': 'eric', 'count':20}, {'name': 'john', 'count':30}]     eric    10 (i.e. 20 -10)
    20    [{'name': 'james', 'count':20}, {'name': 'jamie', 'count':35}]   james   0 (i.e. 20-20)

I know I can do "apply" and/or "for loops" but don't know how do it in an elegant way. How can I do it ?

CodePudding user response:

You may check with explode , then we just need to find the min diff within each group by idxmin

s = df.explode('values_2').reset_index()
idx = s.values_2.str.get('count').sub(s['value_1']).groupby(s['index']).idxmin()
df['new'] = s.values_2.str.get('name').loc[idx].values
df
Out[594]: 
                                            values_2  value_1    new
0  [{'name': 'eric', 'count': 20}, {'name': 'john...       10   eric
1  [{'name': 'james', 'count': 20}, {'name': 'jam...       20  james

CodePudding user response:

You can use a custom function:

def get_min(r):
    l = r['values_2']
    MIN = min(range(len(l)), key=lambda d: l[d]['count']-r['value_1'])
    return pd.Series({'lowest': l[MIN]['name'], 'lowest_difference': l[MIN]['count']-r['value_1']})
    
df.join(df.apply(get_min, axis=1))

output:

   value_1                                                          values_2 lowest  lowest_difference
0       10    [{'name': 'eric', 'count': 20}, {'name': 'john', 'count': 30}]   eric                 10
1       20  [{'name': 'james', 'count': 20}, {'name': 'jamie', 'count': 35}]  james                  0
  • Related