Home > database >  Correct way to iterrate over two dataframes to set specific values based on the value of another df
Correct way to iterrate over two dataframes to set specific values based on the value of another df

Time:09-02

Edited to add easier to reproduce dataframe

I have two dataframes that look something like this:

df1

index = [0,1,2,3,4,5,6,7,8]
a = pd.Series([John Smith, John Smith, John Smith, Kobe Bryant, Kobe Bryant, Kobe Bryant, Jeff Daniels, Jeff Daniels, Jeff Daniels],index= index)
b = pd.Series([7/29/2022, 8/7/2022, 8/29/2022, 7/9/2022, 7/29/2022, 8/9/2022, 7/28/2022, 8/8/2022, 8/28/2022],index= index)
c = pd.Series([185, 187, 186.5, 212.5, 217.5, 220.5, 211.1, 210.5, 213],index= index)
d = pd.Series([],index= index)
df1 = pd.DataFrame(np.d_[a,b,c],columns = ["Name","Date","Weight","Goal"])

or df1 in this format:

Name Date Weight Goal
John Smith 7/29/2022 185 NaN
John Smith 8/7/2022 187 NaN
John Smith 8/29/2022 186.5 NaN
Kobe Bryant 7/9/2022 212.5 NaN
Kobe Bryant 7/29/2022 217.5 NaN
Kobe Bryant 8/9/2022 220.5 NaN
Jeff Daniels 7/28/2022 211.1 NaN
Jeff Daniels 8/8/2022 210.5 NaN
Jeff Daniels 8/28/2022 213 NaN

df2

index = [0,1,2]
a = pd.Series([John Smith, Kobe Bryant, Jeff Daniels],index= index)
b = pd.Series([195,230,220],index= index)
c = pd.Series([],index= index)
df2 = pd.DataFrame(np.c_[a,b],columns = ["Name", "Weight Goal"]) 

or df2 in this format:

Name Weight Goal
John Smith 195
Kobe Bryant 230
Jeff Daniels 220

What I want to do is iterate through df1 and set respective weight goal from df2 for each player...but I only want to do this in August, I want to ignore the July dates.

I know that I shouldn't be using a for loop with a dataframe/pandas but I think me showing my mental thought process with one might show the intent that I was trying to achieve with my code attempts.

for player in df1['Name']:
    df1 = df1.loc[(df1['Name'] == f'{player}') & (df1['Date'] > '8/1/2022')]
    df1.at[df2['Name'] == f'{player}',  'Goal'] = (df2.loc[df2.Name == f'{player}']['Weight Goal'])

This just ends up delivering an empty dataframe & a settingwithcopy warning. I know this is not the right way to do this but I thought it might help to direct me.

Thank You.

CodePudding user response:

If I correctly understand the output you are after (stack overflow tip: it can be useful to provide a sample of your desired output to help people trying to answer your question), then this should work:

# make the Date column into datetime type so it is easier to filter on
df1 = df1.assign(Date=pd.to_datetime(df1.Date))

# separate out the august rows from the other months
df1_august = df1.loc[df1.Date.apply(lambda x: x.month == 8)]
df1_other_months = df1.loc[df1.Date.apply(lambda x: x.month != 8)]

# use a merge rather than a loop to get WeightGoal column in place
df1_august_merged = df1_august.merge(df2, on="Name")

# finally add the rows for the other months back in
final_df = pd.concat([df1_august_merged, df1_other_months])

print(final_df)
           Name       Date  Weight  Goal  Weight Goal
0    John Smith 2022-08-07   187.0   NaN        195.0
1    John Smith 2022-08-29   186.5   NaN        195.0
2   Kobe Bryant 2022-08-09   220.5   NaN        230.0
3  Jeff Daniels 2022-08-08   210.5   NaN        220.0
4  Jeff Daniels 2022-08-28   213.0   NaN        220.0
0    John Smith 2022-07-29   185.0   NaN          NaN
3   Kobe Bryant 2022-07-09   212.5   NaN          NaN
4   Kobe Bryant 2022-07-29   217.5   NaN          NaN
6  Jeff Daniels 2022-07-28   211.1   NaN          NaN
  • Related