Home > Software design >  How do you lookup a particular pandas dataframe column value in a reference table and copy a referen
How do you lookup a particular pandas dataframe column value in a reference table and copy a referen

Time:07-11

I have a reference table that I imported into a dataframe(df2) from a .csv. It's 3 columns and around 400 rows. I have another dataframe (df) that has many columns and rows. I am looking to lookup a value from the reference table and add it to the appropriate column in df.

The data format for the reference table:

MANUF   PRODTYPE        PRODCODE
 
ALPHA       1           ALPHA1
ALPHA       2           ALPHA2
BETA        1           BETA1
BETA        2           BETA2
DELTA       1           DELTA1
DELTA       2           DELTA2

The dataframe (df) is set up like this:

MANUF    PRODTYPE    SERIALNO   PRODCODE    INVENTORY   
ALPHA       1        00001                      5
ALPHA       2        00001                      3
BETA        1        00001                      4
DELTA       1        00001                      8
ALPHA       2        00002                      3
BETA        1        00002                      4
DELTA       2        00001                      9
DELTA       2        00002                      9
DELTA       1        00002                      8
BETA        2        00001                      12
ALPHA       2        00003                      3

I am trying to populate PRODCODE in df with the appropriate value based on MANUF and PRODTYPE in the reference table.

I tried:

df3 = df.merge(df2, how='left') 

and

df3 = df2.merge(df, how='left')

but both gave me either inaccurate or incomplete merges.

CodePudding user response:

I expect this to work in your case:

import sys
import pandas as pd
from io import StringIO
from datetime import datetime

data1 = StringIO("""MANUF;PRODTYPE;PRODCODE
ALPHA;1;ACME1
ALPHA;2;ACME2
BETA;1;BETA1
BETA;2;BETA2
DELTA;1;DELTA1
DELTA;2;DELTA2
""")
df1 = pd.read_csv(data1, sep=";")
print(df1)

data2 = StringIO("""MANUF;PRODTYPE;SERIALNO;PRODCODE;INVENTORY   
ALPHA;1;00001;5
ALPHA;2;00001;3
BETA;1;00001;4
DELTA;1;00001;8
ALPHA;2;00002;3
BETA;1;00002;4
DELTA;2;00001;9
DELTA;2;00002;9
DELTA;1;00002;8
BETA;2;00001;12
ALPHA;2;00003;3
""")
df2 = pd.read_csv(data2, sep=";")
print(df2)

df3 = df2.merge(df1, on=['MANUF', 'PRODTYPE'], how='left')
print(df3)

Result:

    MANUF  PRODTYPE  SERIALNO  PRODCODE_x  INVENTORY    PRODCODE_y
0   ALPHA         1         1           5           NaN      ACME1
1   ALPHA         2         1           3           NaN      ACME2
2    BETA         1         1           4           NaN      BETA1
3   DELTA         1         1           8           NaN     DELTA1
4   ALPHA         2         2           3           NaN      ACME2
5    BETA         1         2           4           NaN      BETA1
6   DELTA         2         1           9           NaN     DELTA2
7   DELTA         2         2           9           NaN     DELTA2
8   DELTA         1         2           8           NaN     DELTA1
9    BETA         2         1          12           NaN      BETA2
10  ALPHA         2         3           3           NaN      ACME2

CodePudding user response:

Another way without merge would be this:

df2 = df2.set_index(['MANUF', 'PRODTYPE'])
output = df2.combine_first(df1.set_index(['MANUF', 'PRODTYPE'])).reset_index()
print(output)

    MANUF  PRODTYPE  INVENTORY PRODCODE  SERIALNO
0   ALPHA         1          5   ALPHA1         1
1   ALPHA         2          3   ALPHA2         1
2   ALPHA         2          3   ALPHA2         2
3   ALPHA         2          3   ALPHA2         3
4    BETA         1          4    BETA1         1
5    BETA         1          4    BETA1         2
6    BETA         2         12    BETA2         1
7   DELTA         1          8   DELTA1         1
8   DELTA         1          8   DELTA1         2
9   DELTA         2          9   DELTA2         1
10  DELTA         2          9   DELTA2         2

Used Input:

df1 = pd.DataFrame({'MANUF': {0: 'ALPHA',
  1: 'ALPHA',
  2: 'BETA',
  3: 'BETA',
  4: 'DELTA',
  5: 'DELTA'},
 'PRODTYPE': {0: 1, 1: 2, 2: 1, 3: 2, 4: 1, 5: 2},
 'PRODCODE': {0: 'ALPHA1',
  1: 'ALPHA2',
  2: 'BETA1',
  3: 'BETA2',
  4: 'DELTA1',
  5: 'DELTA2'}})

df2 = pd.DataFrame({'MANUF': {0: 'ALPHA',
  1: 'ALPHA',
  2: 'BETA',
  3: 'DELTA',
  4: 'ALPHA',
  5: 'BETA',
  6: 'DELTA',
  7: 'DELTA',
  8: 'DELTA',
  9: 'BETA',
  10: 'ALPHA'},
 'PRODTYPE': {0: 1,
  1: 2,
  2: 1,
  3: 1,
  4: 2,
  5: 1,
  6: 2,
  7: 2,
  8: 1,
  9: 2,
  10: 2},
 'SERIALNO': {0: 1,
  1: 1,
  2: 1,
  3: 1,
  4: 2,
  5: 2,
  6: 1,
  7: 2,
  8: 2,
  9: 1,
  10: 3},
 'INVENTORY': {0: 5,
  1: 3,
  2: 4,
  3: 8,
  4: 3,
  5: 4,
  6: 9,
  7: 9,
  8: 8,
  9: 12,
  10: 3}})
  • Related