I have two columns in a Pandas Dataframe:
value_1 values_2
10 [{'name': 'eric', 'count':20}, {'name': 'john', 'count':30}]
20 [{'name': 'james', 'count':20}, {'name': 'jamie', 'count':35}]
I would like to create a function that creates 2 columns in the same dataframe. Here is what I try to have in the function:
- Calculates differences in each row between 'count' key values in column values_2 and value in column 'value_1'.
- In the new columns keep only the lowest name difference, and in the other the lowest difference value. example for row 1: for eric 20 - 10 and for john 30 - 10 The lowest is for eric, I would like to give his name as a value in a new column and the lowest difference value in another.
Expected output:
value_1 values_2 lowest lowest_difference
10 [{'name': 'eric', 'count':20}, {'name': 'john', 'count':30}] eric 10 (i.e. 20 -10)
20 [{'name': 'james', 'count':20}, {'name': 'jamie', 'count':35}] james 0 (i.e. 20-20)
I know I can do "apply" and/or "for loops" but don't know how do it in an elegant way. How can I do it ?
CodePudding user response:
You may check with explode
, then we just need to find the min diff within each group by idxmin
s = df.explode('values_2').reset_index()
idx = s.values_2.str.get('count').sub(s['value_1']).groupby(s['index']).idxmin()
df['new'] = s.values_2.str.get('name').loc[idx].values
df
Out[594]:
values_2 value_1 new
0 [{'name': 'eric', 'count': 20}, {'name': 'john... 10 eric
1 [{'name': 'james', 'count': 20}, {'name': 'jam... 20 james
CodePudding user response:
You can use a custom function:
def get_min(r):
l = r['values_2']
MIN = min(range(len(l)), key=lambda d: l[d]['count']-r['value_1'])
return pd.Series({'lowest': l[MIN]['name'], 'lowest_difference': l[MIN]['count']-r['value_1']})
df.join(df.apply(get_min, axis=1))
output:
value_1 values_2 lowest lowest_difference
0 10 [{'name': 'eric', 'count': 20}, {'name': 'john', 'count': 30}] eric 10
1 20 [{'name': 'james', 'count': 20}, {'name': 'jamie', 'count': 35}] james 0