Edited to add easier to reproduce dataframe
I have two dataframes that look something like this:
df1
index = [0,1,2,3,4,5,6,7,8]
a = pd.Series([John Smith, John Smith, John Smith, Kobe Bryant, Kobe Bryant, Kobe Bryant, Jeff Daniels, Jeff Daniels, Jeff Daniels],index= index)
b = pd.Series([7/29/2022, 8/7/2022, 8/29/2022, 7/9/2022, 7/29/2022, 8/9/2022, 7/28/2022, 8/8/2022, 8/28/2022],index= index)
c = pd.Series([185, 187, 186.5, 212.5, 217.5, 220.5, 211.1, 210.5, 213],index= index)
d = pd.Series([],index= index)
df1 = pd.DataFrame(np.d_[a,b,c],columns = ["Name","Date","Weight","Goal"])
or df1 in this format:
Name | Date | Weight | Goal |
---|---|---|---|
John Smith | 7/29/2022 | 185 | NaN |
John Smith | 8/7/2022 | 187 | NaN |
John Smith | 8/29/2022 | 186.5 | NaN |
Kobe Bryant | 7/9/2022 | 212.5 | NaN |
Kobe Bryant | 7/29/2022 | 217.5 | NaN |
Kobe Bryant | 8/9/2022 | 220.5 | NaN |
Jeff Daniels | 7/28/2022 | 211.1 | NaN |
Jeff Daniels | 8/8/2022 | 210.5 | NaN |
Jeff Daniels | 8/28/2022 | 213 | NaN |
df2
index = [0,1,2]
a = pd.Series([John Smith, Kobe Bryant, Jeff Daniels],index= index)
b = pd.Series([195,230,220],index= index)
c = pd.Series([],index= index)
df2 = pd.DataFrame(np.c_[a,b],columns = ["Name", "Weight Goal"])
or df2 in this format:
Name | Weight Goal |
---|---|
John Smith | 195 |
Kobe Bryant | 230 |
Jeff Daniels | 220 |
What I want to do is iterate through df1 and set respective weight goal from df2 for each player...but I only want to do this in August, I want to ignore the July dates.
I know that I shouldn't be using a for loop with a dataframe/pandas but I think me showing my mental thought process with one might show the intent that I was trying to achieve with my code attempts.
for player in df1['Name']:
df1 = df1.loc[(df1['Name'] == f'{player}') & (df1['Date'] > '8/1/2022')]
df1.at[df2['Name'] == f'{player}', 'Goal'] = (df2.loc[df2.Name == f'{player}']['Weight Goal'])
This just ends up delivering an empty dataframe & a settingwithcopy warning. I know this is not the right way to do this but I thought it might help to direct me.
Thank You.
CodePudding user response:
If I correctly understand the output you are after (stack overflow tip: it can be useful to provide a sample of your desired output to help people trying to answer your question), then this should work:
# make the Date column into datetime type so it is easier to filter on
df1 = df1.assign(Date=pd.to_datetime(df1.Date))
# separate out the august rows from the other months
df1_august = df1.loc[df1.Date.apply(lambda x: x.month == 8)]
df1_other_months = df1.loc[df1.Date.apply(lambda x: x.month != 8)]
# use a merge rather than a loop to get WeightGoal column in place
df1_august_merged = df1_august.merge(df2, on="Name")
# finally add the rows for the other months back in
final_df = pd.concat([df1_august_merged, df1_other_months])
print(final_df)
Name Date Weight Goal Weight Goal
0 John Smith 2022-08-07 187.0 NaN 195.0
1 John Smith 2022-08-29 186.5 NaN 195.0
2 Kobe Bryant 2022-08-09 220.5 NaN 230.0
3 Jeff Daniels 2022-08-08 210.5 NaN 220.0
4 Jeff Daniels 2022-08-28 213.0 NaN 220.0
0 John Smith 2022-07-29 185.0 NaN NaN
3 Kobe Bryant 2022-07-09 212.5 NaN NaN
4 Kobe Bryant 2022-07-29 217.5 NaN NaN
6 Jeff Daniels 2022-07-28 211.1 NaN NaN