More efficient way to write this for loop?-CodePudding

import pandas as pd

sim = [['Matthew Stafford', 15, 13, 12], ['Dalvin Cook', 18, 16, 17], ['Daniel Jones', 17, 17, 15], ['Joe Mixon', 16, 15, 15]]
col = ['Player', 1 , 2, 3]
NFL_Sim = pd.DataFrame(sim, columns=col)
list = [['Matthew Stafford', 'Dalvin Cook'], ['Daniel Jones', 'Joe Mixon']]
col = ['QB', 'RB']
output_lines = pd.DataFrame(list, columns=col)

for x in range(1, 4):
    output_lines[x] = output_lines.QB.map(NFL_Sim.set_index('Player')[x].to_dict())   output_lines.RB.map(NFL_Sim.set_index('Player')[x].to_dict())

print(output_lines)

                 QB           RB   1   2   3
0  Matthew Stafford  Dalvin Cook  33  29  29
1      Daniel Jones    Joe Mixon  33  32  30

The desired output is correct, however as I scale this up, I have thousands of columns in the NFL_Sim dataframe which makes mapping extremely slow. Is there a more efficient way to write this for loop? Or convert output_lines to a list first? I'm really not sure what's best.

CodePudding user response：

A much more dynamic way with melt:

>>> x = output_lines.melt(value_name='Player', ignore_index=False).merge(NFL_Sim, on='Player')
>>> output_lines = output_lines.join(x.loc[[*x.index[::2], *x.index[1::2]]].groupby(x.index // 2).sum())
>>> output_lines
                 RB           QB   1   2   3
0  Matthew Stafford  Dalvin Cook  33  29  29
1      Daniel Jones    Joe Mixon  33  32  30
>>>

CodePudding user response：

First, I suggest you set the index of NFL_Sim once, when it's created. That way you don't have to do it twice inside the loop.

Second, if you have a list of quarterbacks and a list of running backs, I suggest you create two matrices: one for the quarterbacks and one for the running backs. Then you can add these two together.

import pandas as pd

sim = [['Matthew Stafford', 15, 13, 12], ['Dalvin Cook', 18, 16, 17], ['Daniel Jones', 17, 17, 15], ['Joe Mixon', 16, 15, 15]]
col = ['Player', 1 , 2, 3]
NFL_Sim = pd.DataFrame(sim, columns=col).set_index('Player')
qbs = ['Matthew Stafford', 'Daniel Jones']
rbs = ['Dalvin Cook', 'Joe Mixon']

qb_scores = NFL_Sim.loc[qbs, :]
rb_scores = NFL_Sim.loc[rbs, :]
# We need to reset the index because otherwise the addition
# of qb_scores and rb_scores will not be compatible; they have
# different indexes
output = qb_scores.reset_index(drop=True)   rb_scores.reset_index(drop=True)
output = output.assign(QB=qbs, RB=rbs)

CodePudding user response：

Create your mapping, using a Series: (the pairing of QB and RB is taken care of in output_lines, we want to use the index positions to connect to NFL_Sim)

mapping = output_lines.T.stack()
mapping = pd.Series(mapping.index.droplevel(0), mapping)

Get the sum for each position:

mapping = (NFL_Sim.assign(positions = lambda df: df.Player.map(mapping))
           # we do not need the Player column anymore,
           # since we have our mapping
                 .select_dtypes('number')
                 .groupby('positions')
                 .sum()
           )

Reconnect mapping back to output_lines

output_lines.join(mapping)

                 QB           RB   1   2   3
0  Matthew Stafford  Dalvin Cook  33  29  29
1      Daniel Jones    Joe Mixon  33  32  30