Previously I have utilized the following code to create a value based on a value in a subsequent row.
demo["NFdat"] = demo.groupby('NID')['Fdat'].shift(-1)
This code assigns the "Fdat" from the next row to "NFdat" in the current row grouped by NID and Fdat.
I would like to do something similar where I assign a variable in the current row the maximum value from subsequent rows that share the same "ID" but are from the next cows lactation. Effectively lact 1
The example data is presented below. I would like to determine the maximum Lact_xmast value in the subsequent lactation (Lact) and store the value in a new variable Next_Lact_max_xmast.
NID Lact Lact_xmast
770 207018229 2 1
771 207018229 2 1
772 207018229 3 1
773 207018229 3 1
774 207018229 3 1
775 207018229 3 2
776 207018229 4 1
777 207018229 4 1
778 207018229 4 2
779 207018229 4 2
780 207018229 4 3
781 207018229 4 3
782 207018229 4 3
The output that I would like to achieve is
NID Lact Lact_xmast Next_Lact_max_xmast
770 207018229 2 1 2
771 207018229 2 1 2
772 207018229 3 1 3
773 207018229 3 1 3
774 207018229 3 1 3
775 207018229 3 2 3
776 207018229 4 1 NA
777 207018229 4 1 NA
778 207018229 4 2 NA
779 207018229 4 2 NA
780 207018229 4 3 NA
781 207018229 4 3 NA
782 207018229 4 3 NA
CodePudding user response:
Here's one way to do it:
# For current lactation, get max Lact_xmast for next lactation
max_lact_xmas = df.groupby('Lact')['Lact_xmast'].max().shift(-1)
# Left join the resulting max_lact_xmas Series to original dataframe.
# For the merge condition, we use column from the original dataframe and index from series.
df.merge(max_lact_xmas, left_on='Lact', right_index=True, how='left')
NID Lact Lact_xmast_x Lact_xmast_y
770 207018229 2 1 2.0
771 207018229 2 1 2.0
772 207018229 3 1 3.0
773 207018229 3 1 3.0
774 207018229 3 1 3.0
775 207018229 3 2 3.0
776 207018229 4 1 NaN
777 207018229 4 1 NaN
778 207018229 4 2 NaN
779 207018229 4 2 NaN
780 207018229 4 3 NaN
781 207018229 4 3 NaN
782 207018229 4 3 NaN
CodePudding user response:
Sort "Lact" values just to be cleaner (Not needed):
df["Lact"] = df["Lact"].sort_values(ascending=True)
Create Label for Joining to "Lact" 1:
df["NextLact"] = df["Lact"] 1
Compute max for each "Lact_xmast":
df_grouped = df.groupby(["Lact"], as_index=False).Lact_xmast.max()\
.rename(columns={"Lact_xmast":"Next_Lact_max_xmast", "Lact":"NextLact"})
Join NextLact on max value of groped "Lact_xmast":
df.merge(df_grouped, on="NextLact", how="left")