Home > front end >  Find matching name in another table, return value associated w/ column in pandas
Find matching name in another table, return value associated w/ column in pandas

Time:05-15

I have 2 tables. I want to take DF1 and adjust the values in the tables given the values in DF2. DF2 is simply a groupby of a column in DF1. In domain terms, I am trying to adjust the players statistics in DF1 for difficulty of the league in DF2. I'm having trouble explaining my question so I provided pictures of the tables to try and help. You'll see that DF1 has column 'comp' which corresponds to DF2. For DF1 I want to find "comp" in the index of DF2 and return the value associated w/ the column.

Could a variation of this solve my problem? DF1

DF2

CodePudding user response:

Here is a solution! It involves stacking df1 and df2 so that they can be matched on index and column. As df1's "comp" has duplicate values, they needed to be merged and then unstacked one by one (as unstack doesn't work with duplicate values).

Data:

import pandas as pd

df1 = pd.DataFrame({'player': {1: 'Yunis Abdelhamid', 2: 'Salis Abdul Samed', 3: 'Laurent Abergel', 5: 'Dickson Abiama', 7: 'Tammy Abraham'}, 'nation': {1: 'ma MAR', 2: 'gh GHA', 3: 'fr FRA', 5: 'ng NGA', 7: 'eng ENG'}, 'pos': {1: 'DF', 2: 'MF', 3: 'MF', 5: 'FW', 7: 'FW'}, 'squad': {1: 'Reims', 2: 'Clermont Foot', 3: 'Lorient', 5: 'Greuther Fürth', 7: 'Roma'}, 'comp': {1: 'fr Ligue 1', 2: 'fr Ligue 1', 3: 'fr Ligue 1', 5: 'de Bundesliga', 7: 'it Serie A'}, 'age': {1: '34-227', 2: '22-048', 3: '29-101', 5: '23-191', 7: '24-223'}, 'born': {1: 1987.0, 2: 2000.0, 3: 1993.0, 5: 1998.0, 7: 1997.0}, '90s': {1: 31.1, 2: 27.4, 3: 30.9, 5: 8.1, 7: 32.3}, 'touches': {1: 1784, 2: 1930, 3: 1881, 5: 267, 7: 1055}, 'def_pen': {1: 249, 2: 55, 3: 22, 5: 9, 7: 33}, 'def_3rd': {1: 1007, 2: 621, 3: 428, 5: 28, 7: 85}, 'mid_3rd': {1: 817, 2: 1144, 3: 1229, 5: 126, 7: 536}, 'att_3rd': {1: 90, 2: 299, 3: 345, 5: 126, 7: 489}, 'att_pen': {1: 27, 2: 17, 3: 15, 5: 31, 7: 186}, 'live': {1: 1693, 2: 1914, 3: 1860, 5: 258, 7: 991}, 'succ': {1: 15, 2: 27, 3: 41, 5: 6, 7: 36}, 'att': {1: 21, 2: 42, 3: 63, 5: 18, 7: 75}, 'succ%': {1: 71.4, 2: 64.3, 3: 65.1, 5: 33.3, 7: 48.0}, '#pl': {1: 15, 2: 30, 3: 43, 5: 7, 7: 38}, 'megs': {1: 1, 2: 2, 3: 3, 5: 1, 7: 3}, 'carries': {1: 1113, 2: 1467, 3: 1384, 5: 154, 7: 582}, 'totdist': {1: 6346, 2: 6755, 3: 5244, 5: 605, 7: 2464}, 'prgdist': {1: 3563, 2: 2912, 3: 2639, 5: 326, 7: 1249}, 'prog': {1: 12, 2: 47, 3: 55, 5: 48, 7: 215}, '1/3': {1: 23, 2: 20, 3: 34, 5: 8, 7: 26}, 'cpa': {1: 0, 2: 4, 3: 3, 5: 7, 7: 22}, 'mis': {1: 14, 2: 23, 3: 28, 5: 41, 7: 77}, 'dis': {1: 13, 2: 40, 3: 45, 5: 11, 7: 42}, 'targ': {1: 1154, 2: 1605, 3: 1394, 5: 335, 7: 1340}, 'rec': {1: 1120, 2: 1484, 3: 1299, 5: 171, 7: 726}, 'rec%': {1: 97.1, 2: 92.5, 3: 93.2, 5: 51.0, 7: 54.2}})
df2 = pd.DataFrame({pd.NA: {'de Bundesliga': 0.04705106010159299, 'eng Premier League': 0.03905863519731234, 'es La Liga': -0.019957692390575543, 'fr Ligue 1': -0.07678625197206412, 'it Serie A': 0.018426938326577158}, 'born': {'de Bundesliga': 0.00028905941112555844, 'eng Premier League': -9.456827960363512e-05, 'es La Liga': -0.00047840483346683893, 'fr Ligue 1': 0.00038106934561454864, 'it Serie A': -4.342300278559197e-05}, '90s': {'de Bundesliga': -0.02242504625114483, 'eng Premier League': 0.048731922976664066, 'es La Liga': -0.0149905066318059, 'fr Ligue 1': 0.013935130334292944, 'it Serie A': -0.025416536073811447}, 'touches': {'de Bundesliga': -0.007012081552341343, 'eng Premier League': 0.050874969806333326, 'es La Liga': -0.04893072600904547, 'fr Ligue 1': 0.04334758542412609, 'it Serie A': -0.034809311810162336}, 'def_pen': {'de Bundesliga': 0.04988992932781988, 'eng Premier League': 0.05881763373065385, 'es La Liga': -0.05579752269373517, 'fr Ligue 1': -0.04481705729566732, 'it Serie A': 0.0024189734337858315}, 'def_3rd': {'de Bundesliga': 0.06186573480297897, 'eng Premier League': 0.0389278556215269, 'es La Liga': -0.07767188775294054, 'fr Ligue 1': 0.023015581752473535, 'it Serie A': -0.032393749456107734}, 'mid_3rd': {'de Bundesliga': -0.028539540123693752, 'eng Premier League': 0.016639910674564273, 'es La Liga': -0.04426832916843915, 'fr Ligue 1': 0.09747239212657255, 'it Serie A': -0.0418622749313895}, 'att_3rd': {'de Bundesliga': -0.0677258003546225, 'eng Premier League': 0.1278948628925125, 'es La Liga': -0.024569324379658197, 'fr Ligue 1': -0.021387126771416076, 'it Serie A': -0.018464610046065477}, 'att_pen': {'de Bundesliga': 0.007110827172613909, 'eng Premier League': 0.14160451509170202, 'es La Liga': -0.0611734650276724, 'fr Ligue 1': -0.0881088201338538, 'it Serie A': 0.007444826761989098}, 'live': {'de Bundesliga': -0.008450858104324888, 'eng Premier League': 0.05603909437288501, 'es La Liga': -0.05494276127792341, 'fr Ligue 1': 0.04560568557076894, 'it Serie A': -0.03466810441351198}, 'succ': {'de Bundesliga': -0.012362165147406401, 'eng Premier League': 0.06866777074508845, 'es La Liga': -0.055637228100598124, 'fr Ligue 1': 0.09372007504093172, 'it Serie A': -0.08856653505382717}, 'att': {'de Bundesliga': 0.005336824355495873, 'eng Premier League': 0.0587320294194591, 'es La Liga': -0.05149158091787276, 'fr Ligue 1': 0.0764454862643873, 'it Serie A': -0.08138227424215116}, 'succ%': {'de Bundesliga': -0.030998334917095582, 'eng Premier League': 0.0006975699587437045, 'es La Liga': -0.0020114333371678006, 'fr Ligue 1': 0.01996310166029347, 'it Serie A': 0.008014587372980442}, '#pl': {'de Bundesliga': -0.011176015456384025, 'eng Premier League': 0.06178559510622872, 'es La Liga': -0.053111868552582875, 'fr Ligue 1': 0.0908085024342733, 'it Serie A': -0.08285310712659288}, 'megs': {'de Bundesliga': 0.0253126184160668, 'eng Premier League': 0.09265353935404708, 'es La Liga': 0.0054787317945212965, 'fr Ligue 1': -0.040851040851040854, 'it Serie A': -0.07288946347769876}, 'carries': {'de Bundesliga': -0.03903207021173538, 'eng Premier League': 0.05232455323100438, 'es La Liga': -0.06356832873867313, 'fr Ligue 1': 0.07645657929266236, 'it Serie A': -0.02724465858512981}, 'totdist': {'de Bundesliga': -0.04812044800721793, 'eng Premier League': 0.04704197170020219, 'es La Liga': -0.07132521472124087, 'fr Ligue 1': 0.07140894086202842, 'it Serie A': -0.002442635578199015}, 'prgdist': {'de Bundesliga': -0.0714999888726322, 'eng Premier League': 0.05420271969861634, 'es La Liga': -0.06654412613544947, 'fr Ligue 1': 0.09418730246570628, 'it Serie A': -0.016453324264714242}, 'prog': {'de Bundesliga': 0.025766138682432826, 'eng Premier League': 0.04259129468490075, 'es La Liga': -0.05473368088233732, 'fr Ligue 1': 0.0207709163158738, 'it Serie A': -0.026641578115484843}, '1/3': {'de Bundesliga': -0.13259455956555866, 'eng Premier League': 0.09116541850170368, 'es La Liga': -0.0502927350878537, 'fr Ligue 1': 0.03095126212099486, 'it Serie A': 0.04387983864146161}, 'cpa': {'de Bundesliga': -0.10066067674087131, 'eng Premier League': 0.22614740634523578, 'es La Liga': -0.09419357940409367, 'fr Ligue 1': -0.0954191127746733, 'it Serie A': 0.05679137057883232}, 'mis': {'de Bundesliga': 0.07151617153965573, 'eng Premier League': -0.01747840767010267, 'es La Liga': -0.03129327808221427, 'fr Ligue 1': 0.03944729994673235, 'it Serie A': -0.04935434332643396}, 'dis': {'de Bundesliga': -0.0014228976061350096, 'eng Premier League': 0.06993068622265519, 'es La Liga': -0.062821726256443, 'fr Ligue 1': 0.0739611904587274, 'it Serie A': -0.07270645869717707}, 'targ': {'de Bundesliga': -0.02989375871600719, 'eng Premier League': 0.0619179241550869, 'es La Liga': -0.062229984078524536, 'fr Ligue 1': 0.06113385051717368, 'it Serie A': -0.030263477410824247}, 'rec': {'de Bundesliga': -0.03896982275664396, 'eng Premier League': 0.04868701591904645, 'es La Liga': -0.06490963275976946, 'fr Ligue 1': 0.0777484061789595, 'it Serie A': -0.023841319468341227}, 'rec%': {'de Bundesliga': -0.01237444148958411, 'eng Premier League': -0.0104537113692863, 'es La Liga': -0.005015404228401965, 'fr Ligue 1': 0.012032534271996465, 'it Serie A': 0.013233565733276276}})

Stacking df2. To merge on the index, multi-index columns need to have the same names.

# stacked df2 to to make it possible to match on index and column
df4 = df2.stack().to_frame()
df4.index.names = ["comp", "col"]

Merge and multiply:

# output dataframe
df5 = pd.DataFrame(columns=df1.columns)

# for each row in df1
for i in range(0, len(df1)):
    # stack each row of the df1
    p = df1.iloc[i:i 1].set_index("comp").stack().to_frame()
    p.index.names = ["comp", "col"]
    # merge with df4, and multiply row, unstack to a dataframe, convert to list and append
    df5 = df5.append(p.merge(df4, left_index=True, right_index=True, how="left").prod(axis=1).unstack().iloc[0])

# reset the index so that "comp" is a column not the index
df5.reset_index(drop=False, inplace=True)
df5.rename(columns={"index": "comp"}, inplace=True)

Output:

df5
#            comp             player   nation pos           squad  comp     age      born       90s    touches    def_pen    def_3rd     mid_3rd   att_3rd   att_pen       live      succ       att     succ%       #pl      megs     carries     totdist     prgdist      prog       1/3       cpa       mis       dis       targ         rec      rec%
#0     fr Ligue 1   Yunis Abdelhamid   ma MAR  DF           Reims   NaN  34-227  0.757185  0.433383  77.332092 -11.159447  23.176691   79.634944 -1.924841 -2.378938  77.210426  1.405801  1.605355  1.425365  1.362128 -0.040851   85.096173  453.161139  335.589359  0.249251  0.711879 -0.000000  0.552262  0.961495  70.548463   87.078215  1.168359
#1     fr Ligue 1  Salis Abdul Samed   gh GHA  MF   Clermont Foot   NaN  22-048  0.762139  0.381823  83.660840  -2.464938  14.292676  111.508417 -6.394751 -1.497850  87.289282  2.530442  3.210710  1.283627  2.724255 -0.081702  112.161802  482.367396  274.273425  0.976233  0.619025 -0.381676  0.907288  2.958448  98.119830  115.378635  1.113009
#2     fr Ligue 1    Laurent Abergel   fr FRA  MF         Lorient   NaN  29-101  0.759471  0.430596  81.536808  -0.985975   9.850669  119.793570 -7.378559 -1.321632  84.826575  3.842523  4.816066  1.299598  3.904766 -0.122553  105.815906  374.468486  248.560291  1.142400  1.052343 -0.286257  1.104524  3.328254  85.220588  100.995180  1.121432
#3  de Bundesliga     Dickson Abiama   ng NGA  FW  Greuther Fürth   NaN  23-191  0.577541 -0.181643  -1.872226   0.449009   1.732241   -3.595982 -8.533451  0.220436  -2.180321 -0.074173  0.096063 -1.032245 -0.078232  0.025313   -6.010939  -29.112871  -23.308996  1.236775 -1.060756 -0.704625  2.932163 -0.015652 -10.014409   -6.663840 -0.631097
#4     it Serie A      Tammy Abraham  eng ENG  FW            Roma   NaN  24-223 -0.086716 -0.820954 -36.723824   0.079826  -2.753469  -22.438179 -9.029194  1.384738 -34.356091 -3.188395 -6.103671  0.384700 -3.148418 -0.218668  -15.856391   -6.018654  -20.550202 -5.727939  1.140876  1.249410 -3.800284 -3.053671 -40.553060  -17.308798  0.717259

CodePudding user response:

Anytime you think "I would have done this using LOOKUP in Excel", you probably want a join or merge in pandas:

df1.merge(df2[["comp", "touches"]], on="comp", suffixes=("_df1", "_df2"))

(For future questions, please include your dataframes as text, not screenshots! df.to_clipboard() will copy the content of the dataframe to clipboard)

  • Related