I have a dictionary with keys that are a date time, and values that are data frames from that day, like so:
dict = {dt.datetime(2021,11,1):pd.DataFrame({'team':['horses','eagles'],
'rating':[15,20]}),
dt.datetime(2021,11,2):pd.DataFrame({'team':['horses','eagles'],
'rating':[35,50]})}
{datetime.datetime(2021, 11, 1, 0, 0):
team rating
0 horses 15
1 eagles 20,
datetime.datetime(2021, 11, 2, 0, 0):
team rating
0 horses 35
1 eagles 50}
I also have a separate data frame with a date time column and the names of those teams in the same row, like so:
df = pd.DataFrame({'date':[dt.datetime(2021,11,1),dt.datetime(2021,11,2)],
'team_1':'horses',
'team_2':'eagles'})
date team_1 team_2
0 2021-11-01 horses eagles
1 2021-11-02 horses eagles
What I would like to do is to iterate through the dictionary so that if the datetime key matches a date in the date column of df, then to take the data from the value and merge it to teams in df to create a final data frame, like so:
date team_1 team_1_rating team_2 team_2_rating
0 2021-11-01 horses 15 eagles 20
1 2021-11-02 horses 35 eagles 50
I tried to iterate through the dictionary and merge the data with this code:
for key, value in dict.items:
if key == df['date']:
final_df = pd.merge(df,value,left_on = ['team_1','team_2'], right_on = 'team')
But I get an error like
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I'm not quite sure if this is a good way to do it or not but any help would be appreciated!!
CodePudding user response:
The error lies here:
if key == df['date']:
It says it is ambiguous, because indeed, it doesn't know to which cell to compare it.
For example:
x = '2021-11-01'
if x == df['date']:
print ("hola")
It will give you that error.
So,you will need to loop through df.iloc
CodePudding user response:
Your issue is the line if key == df['date']
doesn't do what you think it does. You are comparing if key (which is a datetime object) is the same as df['date']
, which is a pandas.Series
. Because these two objects are not a str
, int
, float
etc, Python tries to compare the bool
of each of the two. This exception is raised when you try to get the bool
of a pandas.Series
.
You could use a for loop to iterate through each value in df['data']
, but a much cleaner method is to use the built-in Pandas functionality of loc
(docs here).
import datetime as dt
import pandas as pd
import numpy as np
my_dict = {dt.datetime(2021,11,1):pd.DataFrame({'team':['horses','eagles'],
'rating':[15,20]}),
dt.datetime(2021,11,2):pd.DataFrame({'team':['horses','eagles'],
'rating':[35,50]})}
df = pd.DataFrame({'date':[dt.datetime(2021,11,1),dt.datetime(2021,11,2)],
'team_1':'horses',
'team_2':'eagles'})
# Searching for key match in df
for key, value in my_dict.items():
df.loc[df['date'] == key, 'team_1_rating'] = value['rating'][0]
df.loc[df['date'] == key, 'team_2_rating'] = value['rating'][1]
# Reordering columns to match desired output
col_list = list(df)
col_list[2], col_list[3] = col_list[3], col_list[2]
df.columns = col_list
print(df)
Note the bottom block just reorders your dataframe to match your desired output.