Home > front end >  Assign the value to table 2 by lookup table 1 with non-unique data in python
Assign the value to table 2 by lookup table 1 with non-unique data in python

Time:06-29

I am new learner for python. I have two dataframe loaded from xlsx file in python, and tried to get the value to table 2 from table 1

Table 1:

Product ID Inventory Receipt Date Age Days Quantity
AA12345678 Jan 21, 2022 120 400
AA12345678 Jan 30, 2022 111 100
AA12345678 Jan 31, 2022 110 20
BB12345678 Jan 21, 2022 120 120
BB12345678 Feb 1, 2022 109 100

Table 2:

Location Code Product ID Required Quantity
ABCD001 AA12345678 100
ABCD001 AA12345678 401
ABCD002 AA12345678 19
EFGH001 BB12345678 200
EFGH002 BB12345678 20

Expected Result:

Location Code Product ID Required Quantity Age days 1 Age days 2 Age days 3
ABCD001 AA12345678 100 120
ABCD001 AA12345678 401 120 111 110
ABCD002 AA12345678 19 110
EFGH001 BB12345678 200 120 109
EFGH002 BB12345678 20 109

The rule of product distribution is first come first served. For example, 'Location Code' ABCD001 require 100 qty. ('Product ID'= 'AA12345678')on row 2 in table 2. It will distribute 100 qty. ('Product ID'='AA12345678') to row 1 and get the 'Age days' to table 2. When the 'Quantity' on row 2 in table 1 is empty, it will lookup row 3 (with same 'Product ID'=AA12345678). The total number of 'Quantity' in table 1 is same as table 2.

I tried to use df2.merge(df1[['Product ID', 'Age Days']], 'left'), but 'Age Days' cannot merge to df2. And tried to use map function (df2['Age Days'] = df2['Product ID'].map(df1.set_index('Product ID')['Age Days'])), but it occur error "uniquely value".

Issue: 'Product ID' is non-unique for lookup/map/merge. How could get all results or its index by lookup/map/merge/other method? In that case, I need to set a flag called "is_empty" for checking, if "is_empty" == yes: I need to get the value from next matched row

I know that case is complex, could you let me know the way to solve it? I am confusing what keywords should I use to study for it. Thank you.

CodePudding user response:

After converting your comments into program logic, I ended up with the following :

table 1 file

  |Product ID|Inventory Receipt Date|Age Days|Quantity|
  |:---------|:---------------------|:-------|:-------|
  |AA12345678|Jan 21, 2022|120|400|
  |AA12345678|Jan 30, 2022|111|100| 
  |AA12345678|Jan 31, 2022|110|20| 
  |BB12345678|Jan 21, 2022|120|120| 
  |BB12345678|Feb 1, 2022|109|100|

table 2 file

 |Location Code|Product ID|Required Quantity|
  |-------------|----------|-----------------|
  |ABCD001|AA12345678|100| 
  |ABCD001|AA12345678|401| 
  |ABCD002|AA12345678|19| 
  |EFGH001|BB12345678|200| 
  |EFGH002|BB12345678|20|

code

import pandas as pd
import numpy as np

# convert first table from markdown to dataframe
df1 = pd.read_csv('table1', sep="|")
df1.drop(df1.columns[len(df1.columns)-1], axis=1, inplace=True)
df1.drop(df1.columns[0], axis=1, inplace=True)
df1 = df1.iloc[1:]
df1 = df1.reset_index()

# convert second table from markdown to dataframe
df2 = pd.read_csv('table2', sep="|")
df2.drop(df2.columns[len(df2.columns)-1], axis=1, inplace=True)
df2.drop(df2.columns[0], axis=1, inplace=True)
df2 = df2.iloc[1:]
df2 = df2.reset_index()

# Convert data type of quantities to integer
df1["Quantity"] = pd.to_numeric(df1["Quantity"])
df2["Required Quantity"] = pd.to_numeric(df2["Required Quantity"])

# create an extra column in df2
df2["Age 1"] = np.nan
df2_extra_col = 1

for i2, row2 in df2.iterrows():
    age_col = 1  # current age column
    product_id = row2["Product ID"]
    required_quantity = row2["Required Quantity"]

    # search for this product id in df1
    for i1, row1 in df1.iterrows():
        available_quantity = row1["Quantity"]

        if row1["Product ID"] == product_id:
            if available_quantity == 0:  # skip 0 quantity rows
                continue
            if available_quantity < required_quantity:  # insufficient quantity
                required_quantity -= available_quantity
                df1.loc[i1, "Quantity"] = 0  # take everything
                df2.loc[i2, "Age " str(age_col)] = row1["Age Days"]

                # add another column to df2 if missing
                age_col  = 1
                if(age_col > df2_extra_col):
                    df2["Age " str(age_col)] = np.nan
                    df2_extra_col  = 1

                continue

            else:  # single delivery enough
                df2.loc[i2, "Age " str(age_col)] = row1["Age Days"]
                df1.loc[i1, "Quantity"] -= required_quantity
                break

df2.drop(df2.columns[0], axis=1, inplace=True)
print(df2)

Result

  Location Code  Product ID  Required Quantity Age 1 Age 2 Age 3
0       ABCD001  AA12345678                100   120   NaN   NaN
1       ABCD001  AA12345678                401   120   111   110
2       ABCD002  AA12345678                 19   110   NaN   NaN
3       EFGH001  BB12345678                200   120   109   NaN
4       EFGH002  BB12345678                 20   109   NaN   NaN

Notes

  • I will highly recommend you to use single stepping and breakpoints to debug my code to understand what's happening on each line. Let me know if there's anything unclear.
  • My file contents were in markdown so there was some pre-processing to do to convert it to dataframe. If your files are already in csv, you can convert your files directly to a dataframe using df.read_csv.
  • Related