Home > Mobile >  How to return the max of a series in a conditional lambda statement
How to return the max of a series in a conditional lambda statement

Time:10-02

I am using the following code to return the number of rows where the NID and Fdat in df1 and df2 are the same and the Tdate in df2 is less than the RxDate in df1.

def breedings(row):
    
    row['TLSCC'] = len(df2[((df2['NID']==row['NID']) & (df2['Fdat']==row['Fdat']) & (df2['Tdate'] < row['RxDate']) )] )
     
    return row

df1 = df1.apply(lambda x:breedings(x),axis=1)

The result is place in a new column TLSCC in df1

I would like to search the records in df2 to find the maximum SCC value using the same conditions that were used to determine the number of rows that met the conditions. ie I would like to return the max.scc value from these rows to a new column in df1.

Replacing len with max does not work as it does not link to SCC. I have also tried putting max into the apply lambda statement. Without success.

Any pointers appreciated.

The data for df1 and df2 is provided below.

df1

{'NID': {1: '212017463',
  2: '212017463',
  3: '212017463',
  4: '213015177',
  5: '213015177',
  6: '213015177',
  7: '213015177',
  8: '213015177',
  9: '213015177',
  10: '213015177'},
 'Fdat': {1: Timestamp('2020-08-06 00:00:00'),
  2: Timestamp('2020-08-06 00:00:00'),
  3: Timestamp('2020-08-06 00:00:00'),
  4: Timestamp('2018-08-30 00:00:00'),
  5: Timestamp('2018-08-30 00:00:00'),
  6: Timestamp('2018-08-30 00:00:00'),
  7: Timestamp('2018-08-30 00:00:00'),
  8: Timestamp('2018-08-30 00:00:00'),
  9: Timestamp('2018-08-30 00:00:00'),
  10: Timestamp('2018-08-30 00:00:00')},
 'RxDate': {1: Timestamp('2020-09-30 00:00:00'),
  2: Timestamp('2020-10-12 00:00:00'),
  3: Timestamp('2020-10-15 00:00:00'),
  4: Timestamp('2018-12-06 00:00:00'),
  5: Timestamp('2018-12-07 00:00:00'),
  6: Timestamp('2018-12-09 00:00:00'),
  7: Timestamp('2018-12-11 00:00:00'),
  8: Timestamp('2018-12-17 00:00:00'),
  9: Timestamp('2018-12-18 00:00:00'),
  10: Timestamp('2018-12-19 00:00:00')}}

df2

{'NID': {36: '212017463',
  37: '212017463',
  38: '212017463',
  39: '212017463',
  40: '212017463',
  41: '212017463',
  42: '212017463',
  43: '212017463',
  44: '212017463',
  45: '212017463',
  46: '212017463',
  47: '212017463',
  48: '212017463',
  49: '212017463',
  50: '212017463',
  51: '212017463',
  52: '212017463',
  53: '212017463',
  54: '212017463',
  55: '212017463',
  56: '212017463',
  57: '212017463',
  58: '212017463',
  59: '212017463',
  60: '212017463',
  61: '212017463',
  62: '212017463',
  63: '212017463',
  64: '212017463',
  65: '212017463',
  66: '212017463',
  67: '212017463',
  68: '212017463',
  69: '212017463',
  70: '213015177',
  71: '213015177',
  72: '213015177',
  73: '213015177',
  74: '213015177',
  75: '213015177',
  76: '213015177',
  77: '213015177',
  78: '213015177',
  79: '213015177',
  80: '213015177',
  81: '213015177',
  82: '213015177',
  83: '213015177',
  84: '213015177',
  85: '213015177',
  86: '213015177',
  87: '213015177',
  88: '213015177',
  89: '213015177',
  90: '213015177',
  91: '213015177',
  92: '213015177',
  93: '213015177',
  94: '213015177',
  95: '213015177',
  96: '213015177',
  97: '213015177',
  98: '213015177',
  99: '213015177',
  100: '213015177',
  101: '213015177'},
 'Fdat': {36: Timestamp('2014-08-17 00:00:00'),
  37: Timestamp('2014-08-17 00:00:00'),
  38: Timestamp('2014-08-17 00:00:00'),
  39: Timestamp('2014-08-17 00:00:00'),
  40: Timestamp('2014-08-17 00:00:00'),
  41: Timestamp('2015-09-22 00:00:00'),
  42: Timestamp('2015-09-22 00:00:00'),
  43: Timestamp('2015-09-22 00:00:00'),
  44: Timestamp('2015-09-22 00:00:00'),
  45: Timestamp('2015-09-22 00:00:00'),
  46: Timestamp('2015-09-22 00:00:00'),
  47: Timestamp('2016-09-01 00:00:00'),
  48: Timestamp('2016-09-01 00:00:00'),
  49: Timestamp('2016-09-01 00:00:00'),
  50: Timestamp('2016-09-01 00:00:00'),
  51: Timestamp('2016-09-01 00:00:00'),
  52: Timestamp('2017-08-31 00:00:00'),
  53: Timestamp('2017-08-31 00:00:00'),
  54: Timestamp('2017-08-31 00:00:00'),
  55: Timestamp('2017-08-31 00:00:00'),
  56: Timestamp('2017-08-31 00:00:00'),
  57: Timestamp('2017-08-31 00:00:00'),
  58: Timestamp('2018-09-17 00:00:00'),
  59: Timestamp('2018-09-17 00:00:00'),
  60: Timestamp('2018-09-17 00:00:00'),
  61: Timestamp('2018-09-17 00:00:00'),
  62: Timestamp('2018-09-17 00:00:00'),
  63: Timestamp('2019-08-25 00:00:00'),
  64: Timestamp('2019-08-25 00:00:00'),
  65: Timestamp('2019-08-25 00:00:00'),
  66: Timestamp('2019-08-25 00:00:00'),
  67: Timestamp('2019-08-25 00:00:00'),
  68: Timestamp('2020-08-06 00:00:00'),
  69: Timestamp('2020-08-06 00:00:00'),
  70: Timestamp('2015-09-01 00:00:00'),
  71: Timestamp('2015-09-01 00:00:00'),
  72: Timestamp('2015-09-01 00:00:00'),
  73: Timestamp('2015-09-01 00:00:00'),
  74: Timestamp('2015-09-01 00:00:00'),
  75: Timestamp('2015-09-01 00:00:00'),
  76: Timestamp('2016-08-18 00:00:00'),
  77: Timestamp('2016-08-18 00:00:00'),
  78: Timestamp('2016-08-18 00:00:00'),
  79: Timestamp('2016-08-18 00:00:00'),
  80: Timestamp('2016-08-18 00:00:00'),
  81: Timestamp('2017-09-01 00:00:00'),
  82: Timestamp('2017-09-01 00:00:00'),
  83: Timestamp('2017-09-01 00:00:00'),
  84: Timestamp('2017-09-01 00:00:00'),
  85: Timestamp('2017-09-01 00:00:00'),
  86: Timestamp('2017-09-01 00:00:00'),
  87: Timestamp('2018-08-30 00:00:00'),
  88: Timestamp('2018-08-30 00:00:00'),
  89: Timestamp('2018-08-30 00:00:00'),
  90: Timestamp('2018-08-30 00:00:00'),
  91: Timestamp('2018-08-30 00:00:00'),
  92: Timestamp('2019-08-09 00:00:00'),
  93: Timestamp('2019-08-09 00:00:00'),
  94: Timestamp('2019-08-09 00:00:00'),
  95: Timestamp('2019-08-09 00:00:00'),
  96: Timestamp('2019-08-09 00:00:00'),
  97: Timestamp('2020-08-06 00:00:00'),
  98: Timestamp('2020-08-06 00:00:00'),
  99: Timestamp('2020-08-06 00:00:00'),
  100: Timestamp('2020-08-06 00:00:00'),
  101: Timestamp('2020-08-06 00:00:00')},
 'Tdate': {36: Timestamp('2015-04-16 00:00:00'),
  37: Timestamp('2015-02-13 00:00:00'),
  38: Timestamp('2014-10-10 00:00:00'),
  39: Timestamp('2014-12-17 00:00:00'),
  40: Timestamp('2014-11-07 00:00:00'),
  41: Timestamp('2015-09-29 00:00:00'),
  42: Timestamp('2016-05-30 00:00:00'),
  43: Timestamp('2016-02-10 00:00:00'),
  44: Timestamp('2016-04-12 00:00:00'),
  45: Timestamp('2015-11-06 00:00:00'),
  46: Timestamp('2015-12-18 00:00:00'),
  47: Timestamp('2017-05-31 00:00:00'),
  48: Timestamp('2016-11-28 00:00:00'),
  49: Timestamp('2017-04-04 00:00:00'),
  50: Timestamp('2017-02-15 00:00:00'),
  51: Timestamp('2016-10-03 00:00:00'),
  52: Timestamp('2018-05-29 00:00:00'),
  53: Timestamp('2018-03-20 00:00:00'),
  54: Timestamp('2018-04-24 00:00:00'),
  55: Timestamp('2018-02-05 00:00:00'),
  56: Timestamp('2017-10-03 00:00:00'),
  57: Timestamp('2017-11-28 00:00:00'),
  58: Timestamp('2019-06-03 00:00:00'),
  59: Timestamp('2019-04-10 00:00:00'),
  60: Timestamp('2019-01-14 00:00:00'),
  61: Timestamp('2018-10-02 00:00:00'),
  62: Timestamp('2018-11-14 00:00:00'),
  63: Timestamp('2020-03-24 00:00:00'),
  64: Timestamp('2020-05-21 00:00:00'),
  65: Timestamp('2019-11-14 00:00:00'),
  66: Timestamp('2020-01-21 00:00:00'),
  67: Timestamp('2019-09-27 00:00:00'),
  68: Timestamp('2020-09-30 00:00:00'),
  69: Timestamp('2020-12-16 00:00:00'),
  70: Timestamp('2015-09-29 00:00:00'),
  71: Timestamp('2016-04-12 00:00:00'),
  72: Timestamp('2016-05-30 00:00:00'),
  73: Timestamp('2016-02-10 00:00:00'),
  74: Timestamp('2015-11-06 00:00:00'),
  75: Timestamp('2015-12-18 00:00:00'),
  76: Timestamp('2017-04-04 00:00:00'),
  77: Timestamp('2017-05-31 00:00:00'),
  78: Timestamp('2017-02-15 00:00:00'),
  79: Timestamp('2016-11-28 00:00:00'),
  80: Timestamp('2016-10-03 00:00:00'),
  81: Timestamp('2018-05-29 00:00:00'),
  82: Timestamp('2018-03-20 00:00:00'),
  83: Timestamp('2018-04-24 00:00:00'),
  84: Timestamp('2017-10-03 00:00:00'),
  85: Timestamp('2018-02-05 00:00:00'),
  86: Timestamp('2017-11-28 00:00:00'),
  87: Timestamp('2019-04-10 00:00:00'),
  88: Timestamp('2019-06-03 00:00:00'),
  89: Timestamp('2019-01-14 00:00:00'),
  90: Timestamp('2018-11-14 00:00:00'),
  91: Timestamp('2018-10-02 00:00:00'),
  92: Timestamp('2020-05-21 00:00:00'),
  93: Timestamp('2020-03-24 00:00:00'),
  94: Timestamp('2019-11-14 00:00:00'),
  95: Timestamp('2020-01-21 00:00:00'),
  96: Timestamp('2019-09-27 00:00:00'),
  97: Timestamp('2021-05-13 00:00:00'),
  98: Timestamp('2021-01-27 00:00:00'),
  99: Timestamp('2021-03-18 00:00:00'),
  100: Timestamp('2020-09-30 00:00:00'),
  101: Timestamp('2020-12-16 00:00:00')},
 'SCC': {36: 26,
  37: 853,
  38: 66,
  39: 39,
  40: 18,
  41: 1704,
  42: 16,
  43: 13,
  44: 23,
  45: 51,
  46: 15,
  47: 95,
  48: 21,
  49: 51,
  50: 57,
  51: 65,
  52: 55,
  53: 77,
  54: 53,
  55: 65,
  56: 17,
  57: 51,
  58: 113,
  59: 57,
  60: 58,
  61: 520,
  62: 735,
  63: 50,
  64: 98,
  65: 12,
  66: 109,
  67: 19,
  68: 6939,
  69: 1891,
  70: 166,
  71: 15,
  72: 25,
  73: 14,
  74: 34,
  75: 24,
  76: 45,
  77: 74,
  78: 17,
  79: 3,
  80: 7,
  81: 32,
  82: 27,
  83: 122,
  84: 4,
  85: 62,
  86: 7,
  87: 35,
  88: 74,
  89: 117,
  90: 110,
  91: 2,
  92: 55,
  93: 316,
  94: 17,
  95: 19,
  96: 8,
  97: 77,
  98: 12,
  99: 50,
  100: 16,
  101: 19}}

CodePudding user response:

I would first join and then apply a condition on the dates to generate the joined table that matches your conditions:

>>> join = df1.merge(df2, on=['NID', 'Fdat'])
>>> join = join[join['Tdate'] < join['RxDate']]
>>> join
          NID       Fdat     RxDate      Tdate   SCC
2   212017463 2020-08-06 2020-10-12 2020-09-30  6939
4   212017463 2020-08-06 2020-10-15 2020-09-30  6939
9   213015177 2018-08-30 2018-12-06 2018-11-14   110
10  213015177 2018-08-30 2018-12-06 2018-10-02     2
14  213015177 2018-08-30 2018-12-07 2018-11-14   110
15  213015177 2018-08-30 2018-12-07 2018-10-02     2
19  213015177 2018-08-30 2018-12-09 2018-11-14   110
20  213015177 2018-08-30 2018-12-09 2018-10-02     2
24  213015177 2018-08-30 2018-12-11 2018-11-14   110
25  213015177 2018-08-30 2018-12-11 2018-10-02     2
29  213015177 2018-08-30 2018-12-17 2018-11-14   110
30  213015177 2018-08-30 2018-12-17 2018-10-02     2
34  213015177 2018-08-30 2018-12-18 2018-11-14   110
35  213015177 2018-08-30 2018-12-18 2018-10-02     2
39  213015177 2018-08-30 2018-12-19 2018-11-14   110
40  213015177 2018-08-30 2018-12-19 2018-10-02     2

Then a groupby on that table gives you all you want, the number of records with .size() and the max with ['SCC'].max():

>>> join.groupby(df1.columns.to_list()).size().rename('TLSCC')
NID        Fdat        RxDate    
212017463  2020-08-06  2020-10-12    1
                       2020-10-15    1
213015177  2018-08-30  2018-12-06    2
                       2018-12-07    2
                       2018-12-09    2
                       2018-12-11    2
                       2018-12-17    2
                       2018-12-18    2
                       2018-12-19    2
Name: TLSCC, dtype: int64
>>> join.groupby(df1.columns.to_list())['SCC'].max()
NID        Fdat        RxDate    
212017463  2020-08-06  2020-10-12    6939
                       2020-10-15    6939
213015177  2018-08-30  2018-12-06     110
                       2018-12-07     110
                       2018-12-09     110
                       2018-12-11     110
                       2018-12-17     110
                       2018-12-18     110
                       2018-12-19     110

You can then put that all back in df1 with .merge:

>>> group = join.groupby(df1.columns.to_list())
>>> df1 = df1.merge(group.size().rename('TLSCC').reset_index())\
...          .merge(group['SCC'].max().rename('max.scc').reset_index())
...
>>> df1
         NID       Fdat     RxDate  TLSCC  max.scc
0  212017463 2020-08-06 2020-10-12      1     6939
1  212017463 2020-08-06 2020-10-15      1     6939
2  213015177 2018-08-30 2018-12-06      2      110
3  213015177 2018-08-30 2018-12-07      2      110
4  213015177 2018-08-30 2018-12-09      2      110
5  213015177 2018-08-30 2018-12-11      2      110
6  213015177 2018-08-30 2018-12-17      2      110
7  213015177 2018-08-30 2018-12-18      2      110
8  213015177 2018-08-30 2018-12-19      2      110
  • Related