Correct way to iterrate over two dataframes to set specific values based on the value of another df-CodePudding

Edited to add easier to reproduce dataframe

I have two dataframes that look something like this:

df1

index = [0,1,2,3,4,5,6,7,8]
a = pd.Series([John Smith, John Smith, John Smith, Kobe Bryant, Kobe Bryant, Kobe Bryant, Jeff Daniels, Jeff Daniels, Jeff Daniels],index= index)
b = pd.Series([7/29/2022, 8/7/2022, 8/29/2022, 7/9/2022, 7/29/2022, 8/9/2022, 7/28/2022, 8/8/2022, 8/28/2022],index= index)
c = pd.Series([185, 187, 186.5, 212.5, 217.5, 220.5, 211.1, 210.5, 213],index= index)
d = pd.Series([],index= index)
df1 = pd.DataFrame(np.d_[a,b,c],columns = ["Name","Date","Weight","Goal"])

or df1 in this format:

Name	Date	Weight	Goal
John Smith	7/29/2022	185	NaN
John Smith	8/7/2022	187	NaN
John Smith	8/29/2022	186.5	NaN
Kobe Bryant	7/9/2022	212.5	NaN
Kobe Bryant	7/29/2022	217.5	NaN
Kobe Bryant	8/9/2022	220.5	NaN
Jeff Daniels	7/28/2022	211.1	NaN
Jeff Daniels	8/8/2022	210.5	NaN
Jeff Daniels	8/28/2022	213	NaN

df2

index = [0,1,2]
a = pd.Series([John Smith, Kobe Bryant, Jeff Daniels],index= index)
b = pd.Series([195,230,220],index= index)
c = pd.Series([],index= index)
df2 = pd.DataFrame(np.c_[a,b],columns = ["Name", "Weight Goal"])

or df2 in this format:

Name	Weight Goal
John Smith	195
Kobe Bryant	230
Jeff Daniels	220

What I want to do is iterate through df1 and set respective weight goal from df2 for each player...but I only want to do this in August, I want to ignore the July dates.

I know that I shouldn't be using a for loop with a dataframe/pandas but I think me showing my mental thought process with one might show the intent that I was trying to achieve with my code attempts.

for player in df1['Name']:
    df1 = df1.loc[(df1['Name'] == f'{player}') & (df1['Date'] > '8/1/2022')]
    df1.at[df2['Name'] == f'{player}',  'Goal'] = (df2.loc[df2.Name == f'{player}']['Weight Goal'])

This just ends up delivering an empty dataframe & a settingwithcopy warning. I know this is not the right way to do this but I thought it might help to direct me.

Thank You.

CodePudding user response：

If I correctly understand the output you are after (stack overflow tip: it can be useful to provide a sample of your desired output to help people trying to answer your question), then this should work:

# make the Date column into datetime type so it is easier to filter on
df1 = df1.assign(Date=pd.to_datetime(df1.Date))

# separate out the august rows from the other months
df1_august = df1.loc[df1.Date.apply(lambda x: x.month == 8)]
df1_other_months = df1.loc[df1.Date.apply(lambda x: x.month != 8)]

# use a merge rather than a loop to get WeightGoal column in place
df1_august_merged = df1_august.merge(df2, on="Name")

# finally add the rows for the other months back in
final_df = pd.concat([df1_august_merged, df1_other_months])

print(final_df)
           Name       Date  Weight  Goal  Weight Goal
0    John Smith 2022-08-07   187.0   NaN        195.0
1    John Smith 2022-08-29   186.5   NaN        195.0
2   Kobe Bryant 2022-08-09   220.5   NaN        230.0
3  Jeff Daniels 2022-08-08   210.5   NaN        220.0
4  Jeff Daniels 2022-08-28   213.0   NaN        220.0
0    John Smith 2022-07-29   185.0   NaN          NaN
3   Kobe Bryant 2022-07-09   212.5   NaN          NaN
4   Kobe Bryant 2022-07-29   217.5   NaN          NaN
6  Jeff Daniels 2022-07-28   211.1   NaN          NaN