I have 2 tables. I want to take DF1 and adjust the values in the tables given the values in DF2. DF2 is simply a groupby of a column in DF1. In domain terms, I am trying to adjust the players statistics in DF1 for difficulty of the league in DF2. I'm having trouble explaining my question so I provided pictures of the tables to try and help. You'll see that DF1 has column 'comp' which corresponds to DF2. For DF1 I want to find "comp" in the index of DF2 and return the value associated w/ the column.
Could a variation of this solve my problem?
CodePudding user response:
Here is a solution! It involves stacking df1 and df2 so that they can be matched on index and column. As df1's "comp" has duplicate values, they needed to be merged and then unstacked one by one (as unstack doesn't work with duplicate values).
Data:
import pandas as pd
df1 = pd.DataFrame({'player': {1: 'Yunis Abdelhamid', 2: 'Salis Abdul Samed', 3: 'Laurent Abergel', 5: 'Dickson Abiama', 7: 'Tammy Abraham'}, 'nation': {1: 'ma MAR', 2: 'gh GHA', 3: 'fr FRA', 5: 'ng NGA', 7: 'eng ENG'}, 'pos': {1: 'DF', 2: 'MF', 3: 'MF', 5: 'FW', 7: 'FW'}, 'squad': {1: 'Reims', 2: 'Clermont Foot', 3: 'Lorient', 5: 'Greuther Fürth', 7: 'Roma'}, 'comp': {1: 'fr Ligue 1', 2: 'fr Ligue 1', 3: 'fr Ligue 1', 5: 'de Bundesliga', 7: 'it Serie A'}, 'age': {1: '34-227', 2: '22-048', 3: '29-101', 5: '23-191', 7: '24-223'}, 'born': {1: 1987.0, 2: 2000.0, 3: 1993.0, 5: 1998.0, 7: 1997.0}, '90s': {1: 31.1, 2: 27.4, 3: 30.9, 5: 8.1, 7: 32.3}, 'touches': {1: 1784, 2: 1930, 3: 1881, 5: 267, 7: 1055}, 'def_pen': {1: 249, 2: 55, 3: 22, 5: 9, 7: 33}, 'def_3rd': {1: 1007, 2: 621, 3: 428, 5: 28, 7: 85}, 'mid_3rd': {1: 817, 2: 1144, 3: 1229, 5: 126, 7: 536}, 'att_3rd': {1: 90, 2: 299, 3: 345, 5: 126, 7: 489}, 'att_pen': {1: 27, 2: 17, 3: 15, 5: 31, 7: 186}, 'live': {1: 1693, 2: 1914, 3: 1860, 5: 258, 7: 991}, 'succ': {1: 15, 2: 27, 3: 41, 5: 6, 7: 36}, 'att': {1: 21, 2: 42, 3: 63, 5: 18, 7: 75}, 'succ%': {1: 71.4, 2: 64.3, 3: 65.1, 5: 33.3, 7: 48.0}, '#pl': {1: 15, 2: 30, 3: 43, 5: 7, 7: 38}, 'megs': {1: 1, 2: 2, 3: 3, 5: 1, 7: 3}, 'carries': {1: 1113, 2: 1467, 3: 1384, 5: 154, 7: 582}, 'totdist': {1: 6346, 2: 6755, 3: 5244, 5: 605, 7: 2464}, 'prgdist': {1: 3563, 2: 2912, 3: 2639, 5: 326, 7: 1249}, 'prog': {1: 12, 2: 47, 3: 55, 5: 48, 7: 215}, '1/3': {1: 23, 2: 20, 3: 34, 5: 8, 7: 26}, 'cpa': {1: 0, 2: 4, 3: 3, 5: 7, 7: 22}, 'mis': {1: 14, 2: 23, 3: 28, 5: 41, 7: 77}, 'dis': {1: 13, 2: 40, 3: 45, 5: 11, 7: 42}, 'targ': {1: 1154, 2: 1605, 3: 1394, 5: 335, 7: 1340}, 'rec': {1: 1120, 2: 1484, 3: 1299, 5: 171, 7: 726}, 'rec%': {1: 97.1, 2: 92.5, 3: 93.2, 5: 51.0, 7: 54.2}})
df2 = pd.DataFrame({pd.NA: {'de Bundesliga': 0.04705106010159299, 'eng Premier League': 0.03905863519731234, 'es La Liga': -0.019957692390575543, 'fr Ligue 1': -0.07678625197206412, 'it Serie A': 0.018426938326577158}, 'born': {'de Bundesliga': 0.00028905941112555844, 'eng Premier League': -9.456827960363512e-05, 'es La Liga': -0.00047840483346683893, 'fr Ligue 1': 0.00038106934561454864, 'it Serie A': -4.342300278559197e-05}, '90s': {'de Bundesliga': -0.02242504625114483, 'eng Premier League': 0.048731922976664066, 'es La Liga': -0.0149905066318059, 'fr Ligue 1': 0.013935130334292944, 'it Serie A': -0.025416536073811447}, 'touches': {'de Bundesliga': -0.007012081552341343, 'eng Premier League': 0.050874969806333326, 'es La Liga': -0.04893072600904547, 'fr Ligue 1': 0.04334758542412609, 'it Serie A': -0.034809311810162336}, 'def_pen': {'de Bundesliga': 0.04988992932781988, 'eng Premier League': 0.05881763373065385, 'es La Liga': -0.05579752269373517, 'fr Ligue 1': -0.04481705729566732, 'it Serie A': 0.0024189734337858315}, 'def_3rd': {'de Bundesliga': 0.06186573480297897, 'eng Premier League': 0.0389278556215269, 'es La Liga': -0.07767188775294054, 'fr Ligue 1': 0.023015581752473535, 'it Serie A': -0.032393749456107734}, 'mid_3rd': {'de Bundesliga': -0.028539540123693752, 'eng Premier League': 0.016639910674564273, 'es La Liga': -0.04426832916843915, 'fr Ligue 1': 0.09747239212657255, 'it Serie A': -0.0418622749313895}, 'att_3rd': {'de Bundesliga': -0.0677258003546225, 'eng Premier League': 0.1278948628925125, 'es La Liga': -0.024569324379658197, 'fr Ligue 1': -0.021387126771416076, 'it Serie A': -0.018464610046065477}, 'att_pen': {'de Bundesliga': 0.007110827172613909, 'eng Premier League': 0.14160451509170202, 'es La Liga': -0.0611734650276724, 'fr Ligue 1': -0.0881088201338538, 'it Serie A': 0.007444826761989098}, 'live': {'de Bundesliga': -0.008450858104324888, 'eng Premier League': 0.05603909437288501, 'es La Liga': -0.05494276127792341, 'fr Ligue 1': 0.04560568557076894, 'it Serie A': -0.03466810441351198}, 'succ': {'de Bundesliga': -0.012362165147406401, 'eng Premier League': 0.06866777074508845, 'es La Liga': -0.055637228100598124, 'fr Ligue 1': 0.09372007504093172, 'it Serie A': -0.08856653505382717}, 'att': {'de Bundesliga': 0.005336824355495873, 'eng Premier League': 0.0587320294194591, 'es La Liga': -0.05149158091787276, 'fr Ligue 1': 0.0764454862643873, 'it Serie A': -0.08138227424215116}, 'succ%': {'de Bundesliga': -0.030998334917095582, 'eng Premier League': 0.0006975699587437045, 'es La Liga': -0.0020114333371678006, 'fr Ligue 1': 0.01996310166029347, 'it Serie A': 0.008014587372980442}, '#pl': {'de Bundesliga': -0.011176015456384025, 'eng Premier League': 0.06178559510622872, 'es La Liga': -0.053111868552582875, 'fr Ligue 1': 0.0908085024342733, 'it Serie A': -0.08285310712659288}, 'megs': {'de Bundesliga': 0.0253126184160668, 'eng Premier League': 0.09265353935404708, 'es La Liga': 0.0054787317945212965, 'fr Ligue 1': -0.040851040851040854, 'it Serie A': -0.07288946347769876}, 'carries': {'de Bundesliga': -0.03903207021173538, 'eng Premier League': 0.05232455323100438, 'es La Liga': -0.06356832873867313, 'fr Ligue 1': 0.07645657929266236, 'it Serie A': -0.02724465858512981}, 'totdist': {'de Bundesliga': -0.04812044800721793, 'eng Premier League': 0.04704197170020219, 'es La Liga': -0.07132521472124087, 'fr Ligue 1': 0.07140894086202842, 'it Serie A': -0.002442635578199015}, 'prgdist': {'de Bundesliga': -0.0714999888726322, 'eng Premier League': 0.05420271969861634, 'es La Liga': -0.06654412613544947, 'fr Ligue 1': 0.09418730246570628, 'it Serie A': -0.016453324264714242}, 'prog': {'de Bundesliga': 0.025766138682432826, 'eng Premier League': 0.04259129468490075, 'es La Liga': -0.05473368088233732, 'fr Ligue 1': 0.0207709163158738, 'it Serie A': -0.026641578115484843}, '1/3': {'de Bundesliga': -0.13259455956555866, 'eng Premier League': 0.09116541850170368, 'es La Liga': -0.0502927350878537, 'fr Ligue 1': 0.03095126212099486, 'it Serie A': 0.04387983864146161}, 'cpa': {'de Bundesliga': -0.10066067674087131, 'eng Premier League': 0.22614740634523578, 'es La Liga': -0.09419357940409367, 'fr Ligue 1': -0.0954191127746733, 'it Serie A': 0.05679137057883232}, 'mis': {'de Bundesliga': 0.07151617153965573, 'eng Premier League': -0.01747840767010267, 'es La Liga': -0.03129327808221427, 'fr Ligue 1': 0.03944729994673235, 'it Serie A': -0.04935434332643396}, 'dis': {'de Bundesliga': -0.0014228976061350096, 'eng Premier League': 0.06993068622265519, 'es La Liga': -0.062821726256443, 'fr Ligue 1': 0.0739611904587274, 'it Serie A': -0.07270645869717707}, 'targ': {'de Bundesliga': -0.02989375871600719, 'eng Premier League': 0.0619179241550869, 'es La Liga': -0.062229984078524536, 'fr Ligue 1': 0.06113385051717368, 'it Serie A': -0.030263477410824247}, 'rec': {'de Bundesliga': -0.03896982275664396, 'eng Premier League': 0.04868701591904645, 'es La Liga': -0.06490963275976946, 'fr Ligue 1': 0.0777484061789595, 'it Serie A': -0.023841319468341227}, 'rec%': {'de Bundesliga': -0.01237444148958411, 'eng Premier League': -0.0104537113692863, 'es La Liga': -0.005015404228401965, 'fr Ligue 1': 0.012032534271996465, 'it Serie A': 0.013233565733276276}})
Stacking df2. To merge on the index, multi-index columns need to have the same names.
# stacked df2 to to make it possible to match on index and column
df4 = df2.stack().to_frame()
df4.index.names = ["comp", "col"]
Merge and multiply:
# output dataframe
df5 = pd.DataFrame(columns=df1.columns)
# for each row in df1
for i in range(0, len(df1)):
# stack each row of the df1
p = df1.iloc[i:i 1].set_index("comp").stack().to_frame()
p.index.names = ["comp", "col"]
# merge with df4, and multiply row, unstack to a dataframe, convert to list and append
df5 = df5.append(p.merge(df4, left_index=True, right_index=True, how="left").prod(axis=1).unstack().iloc[0])
# reset the index so that "comp" is a column not the index
df5.reset_index(drop=False, inplace=True)
df5.rename(columns={"index": "comp"}, inplace=True)
Output:
df5
# comp player nation pos squad comp age born 90s touches def_pen def_3rd mid_3rd att_3rd att_pen live succ att succ% #pl megs carries totdist prgdist prog 1/3 cpa mis dis targ rec rec%
#0 fr Ligue 1 Yunis Abdelhamid ma MAR DF Reims NaN 34-227 0.757185 0.433383 77.332092 -11.159447 23.176691 79.634944 -1.924841 -2.378938 77.210426 1.405801 1.605355 1.425365 1.362128 -0.040851 85.096173 453.161139 335.589359 0.249251 0.711879 -0.000000 0.552262 0.961495 70.548463 87.078215 1.168359
#1 fr Ligue 1 Salis Abdul Samed gh GHA MF Clermont Foot NaN 22-048 0.762139 0.381823 83.660840 -2.464938 14.292676 111.508417 -6.394751 -1.497850 87.289282 2.530442 3.210710 1.283627 2.724255 -0.081702 112.161802 482.367396 274.273425 0.976233 0.619025 -0.381676 0.907288 2.958448 98.119830 115.378635 1.113009
#2 fr Ligue 1 Laurent Abergel fr FRA MF Lorient NaN 29-101 0.759471 0.430596 81.536808 -0.985975 9.850669 119.793570 -7.378559 -1.321632 84.826575 3.842523 4.816066 1.299598 3.904766 -0.122553 105.815906 374.468486 248.560291 1.142400 1.052343 -0.286257 1.104524 3.328254 85.220588 100.995180 1.121432
#3 de Bundesliga Dickson Abiama ng NGA FW Greuther Fürth NaN 23-191 0.577541 -0.181643 -1.872226 0.449009 1.732241 -3.595982 -8.533451 0.220436 -2.180321 -0.074173 0.096063 -1.032245 -0.078232 0.025313 -6.010939 -29.112871 -23.308996 1.236775 -1.060756 -0.704625 2.932163 -0.015652 -10.014409 -6.663840 -0.631097
#4 it Serie A Tammy Abraham eng ENG FW Roma NaN 24-223 -0.086716 -0.820954 -36.723824 0.079826 -2.753469 -22.438179 -9.029194 1.384738 -34.356091 -3.188395 -6.103671 0.384700 -3.148418 -0.218668 -15.856391 -6.018654 -20.550202 -5.727939 1.140876 1.249410 -3.800284 -3.053671 -40.553060 -17.308798 0.717259
CodePudding user response:
Anytime you think "I would have done this using LOOKUP in Excel", you probably want a join
or merge
in pandas:
df1.merge(df2[["comp", "touches"]], on="comp", suffixes=("_df1", "_df2"))
(For future questions, please include your dataframes as text, not screenshots! df.to_clipboard()
will copy the content of the dataframe to clipboard)