Improving the efficiency of pandas nested for loops-CodePudding

I have a question about the following data on a multiple-day match where people collected points on different days of the match. I want to sum the points gotten by one person (with a unique key) on that person's first, second, and third day and I have two data frames for this (see frames A and B). The problem is that the code I wrote to obtain the result, takes a long time and is very inefficient. I can't seem to figure out how to apply/map works in the case where two values have to be matched (not only person key but also the day). Can anybody help me out with this? I am using Pandas in Python.

Example:

Dataframe A

Person key	Start day	Day	Points
123	10	10	5
123	10	11	1
888	1	1	6
888	1	2	2
888	1	3	4

Dataframe B

Person key	Day 1	Day 2	Day 3	points day 1	points day 2	points day 3
123	10	11	12	0	0	0
888	1	2	3	0	0	0

What I would like to get:

Person key	Day 1	Day 2	Day 3	points day 1	points day 2	points day 3
123	10	11	12	5	1	0
888	1	2	3	6	2	4

The code I tried:

for i in range (0,len(dfB)):
    for t in range(0,len(dfA)): 
        if dfB['Day 1'].iloc[i] == dfA['Day'].iloc[t] and dfB['Person key'].iloc[i] == dfA['Person key'].iloc[t]:
            dfB['points day 1'].iloc[i] = dfA['Points'].iloc[t]
            continue
        if dfB['Day 2'].iloc[i] == dfA['Day'].iloc[t] and dfB['Person key'].iloc[i] == dfA['Person key'].iloc[t]:
            dfB['points day 2'].iloc[i] = dfA['Points'].iloc[t]
            continue
        if dfB['Day 3'].iloc[i] == dfA['Day'].iloc[t] and dfB['Person key'].iloc[i] == dfA['Person key'].iloc[t]:
            dfB['points day 3'].iloc[i] = dfA['Points'].iloc[t]
            continue

CodePudding user response：

Try:

melt dfA to match the structure of dfB
merge both DataFrames
pivot and rename columns to get the required format

melted = dfB.melt("Person key", ["Day 1", "Day 2", "Day 3"], value_name="Day")
output = melted.merge(dfA, on=["Person key", "Day"], how="left").drop("Start day", axis=1)
output = output.pivot("Person key", "variable", ["Day", "Points"]).fillna(0)
output.columns = output.columns.map(lambda x: x[1] if x[0]=="Day" else " ".join(x))

>>> output
            Day 1  Day 2  Day 3  Points Day 1  Points Day 2  Points Day 3
Person key                                                               
123          10.0   11.0   12.0           5.0           1.0           0.0
888           1.0    2.0    3.0           6.0           2.0           4.0

CodePudding user response：

You might wanna convert your DataFrames to dictionarys, do your calculations and convert the reuslt dictionary back to a pandas DataFrame. You can just call the to_dict() function of your DataFrame: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_dict.html and the from_dict() function to convert a dict to a pandas DataFrame: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.from_dict.html

There are some nice entrys here on stackoverflow about the performance of dictionarys:

Is there anything faster than dict()?

Pandas DataFrame performance