Home > database >  Merging data from a data frame with data from a dictionary
Merging data from a data frame with data from a dictionary

Time:11-17

I have a dictionary with keys that are a date time, and values that are data frames from that day, like so:

dict = {dt.datetime(2021,11,1):pd.DataFrame({'team':['horses','eagles'],
                                             'rating':[15,20]}),
        dt.datetime(2021,11,2):pd.DataFrame({'team':['horses','eagles'],
                                             'rating':[35,50]})}

{datetime.datetime(2021, 11, 1, 0, 0):
      team  rating
0  horses      15
1  eagles      20,
 datetime.datetime(2021, 11, 2, 0, 0):
      team  rating
0  horses      35
1  eagles      50}

I also have a separate data frame with a date time column and the names of those teams in the same row, like so:

df = pd.DataFrame({'date':[dt.datetime(2021,11,1),dt.datetime(2021,11,2)],
                       'team_1':'horses',
                       'team_2':'eagles'})

        date  team_1  team_2
0 2021-11-01  horses  eagles
1 2021-11-02  horses  eagles


What I would like to do is to iterate through the dictionary so that if the datetime key matches a date in the date column of df, then to take the data from the value and merge it to teams in df to create a final data frame, like so:

        date  team_1  team_1_rating  team_2  team_2_rating
0 2021-11-01  horses             15  eagles             20
1 2021-11-02  horses             35  eagles             50

I tried to iterate through the dictionary and merge the data with this code:

for key, value in dict.items:
    if key == df['date']:
        final_df = pd.merge(df,value,left_on = ['team_1','team_2'], right_on = 'team')

But I get an error like

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I'm not quite sure if this is a good way to do it or not but any help would be appreciated!!

CodePudding user response:

The error lies here:

if key == df['date']:

It says it is ambiguous, because indeed, it doesn't know to which cell to compare it.

For example:

x = '2021-11-01'
if x == df['date']:
    print ("hola")

It will give you that error.

So,you will need to loop through df.iloc

CodePudding user response:

Your issue is the line if key == df['date'] doesn't do what you think it does. You are comparing if key (which is a datetime object) is the same as df['date'], which is a pandas.Series. Because these two objects are not a str, int, float etc, Python tries to compare the bool of each of the two. This exception is raised when you try to get the bool of a pandas.Series.

You could use a for loop to iterate through each value in df['data'], but a much cleaner method is to use the built-in Pandas functionality of loc (docs here).

import datetime as dt
import pandas as pd
import numpy as np

my_dict = {dt.datetime(2021,11,1):pd.DataFrame({'team':['horses','eagles'],
                                             'rating':[15,20]}),
        dt.datetime(2021,11,2):pd.DataFrame({'team':['horses','eagles'],
                                             'rating':[35,50]})}

df = pd.DataFrame({'date':[dt.datetime(2021,11,1),dt.datetime(2021,11,2)],
                       'team_1':'horses',
                       'team_2':'eagles'})

# Searching for key match in df
for key, value in my_dict.items():
    df.loc[df['date'] == key, 'team_1_rating'] = value['rating'][0]
    df.loc[df['date'] == key, 'team_2_rating'] = value['rating'][1]

# Reordering columns to match desired output
col_list = list(df)
col_list[2], col_list[3] = col_list[3], col_list[2]
df.columns = col_list

print(df)

Note the bottom block just reorders your dataframe to match your desired output.

  • Related