Home > database >  How to use a row value and a row value plus a number in a conditional statement to find the max of a
How to use a row value and a row value plus a number in a conditional statement to find the max of a

Time:08-31

Previously I have utilized the following code to create a value based on a value in a subsequent row.

demo["NFdat"] = demo.groupby('NID')['Fdat'].shift(-1)

This code assigns the "Fdat" from the next row to "NFdat" in the current row grouped by NID and Fdat.

I would like to do something similar where I assign a variable in the current row the maximum value from subsequent rows that share the same "ID" but are from the next cows lactation. Effectively lact 1

The example data is presented below. I would like to determine the maximum Lact_xmast value in the subsequent lactation (Lact) and store the value in a new variable Next_Lact_max_xmast.

           NID  Lact  Lact_xmast
770  207018229     2           1
771  207018229     2           1
772  207018229     3           1
773  207018229     3           1
774  207018229     3           1
775  207018229     3           2
776  207018229     4           1
777  207018229     4           1
778  207018229     4           2
779  207018229     4           2
780  207018229     4           3
781  207018229     4           3
782  207018229     4           3

The output that I would like to achieve is

           NID  Lact  Lact_xmast  Next_Lact_max_xmast
770  207018229     2           1         2
771  207018229     2           1         2
772  207018229     3           1         3 
773  207018229     3           1         3
774  207018229     3           1         3
775  207018229     3           2         3
776  207018229     4           1         NA
777  207018229     4           1         NA
778  207018229     4           2         NA
779  207018229     4           2         NA
780  207018229     4           3         NA
781  207018229     4           3         NA
782  207018229     4           3         NA

CodePudding user response:

Here's one way to do it:

# For current lactation, get max Lact_xmast for next lactation
max_lact_xmas = df.groupby('Lact')['Lact_xmast'].max().shift(-1)

# Left join the resulting max_lact_xmas Series to original dataframe.
# For the merge condition, we use column from the original dataframe and index from series.
df.merge(max_lact_xmas, left_on='Lact', right_index=True, how='left')

           NID  Lact  Lact_xmast_x  Lact_xmast_y
770  207018229     2             1           2.0
771  207018229     2             1           2.0
772  207018229     3             1           3.0
773  207018229     3             1           3.0
774  207018229     3             1           3.0
775  207018229     3             2           3.0
776  207018229     4             1           NaN
777  207018229     4             1           NaN
778  207018229     4             2           NaN
779  207018229     4             2           NaN
780  207018229     4             3           NaN
781  207018229     4             3           NaN
782  207018229     4             3           NaN

CodePudding user response:

Sort "Lact" values just to be cleaner (Not needed):

df["Lact"] = df["Lact"].sort_values(ascending=True)

Create Label for Joining to "Lact" 1:

df["NextLact"] = df["Lact"]   1

Compute max for each "Lact_xmast":

df_grouped = df.groupby(["Lact"], as_index=False).Lact_xmast.max()\
    .rename(columns={"Lact_xmast":"Next_Lact_max_xmast", "Lact":"NextLact"})

Join NextLact on max value of groped "Lact_xmast":

df.merge(df_grouped, on="NextLact", how="left")
  • Related