I am new learner for python. I have two dataframe loaded from xlsx file in python, and tried to get the value to table 2 from table 1

Table 1:

Product ID	Inventory Receipt Date	Age Days	Quantity
AA12345678	Jan 21, 2022	120	400
AA12345678	Jan 30, 2022	111	100
AA12345678	Jan 31, 2022	110	20
BB12345678	Jan 21, 2022	120	120
BB12345678	Feb 1, 2022	109	100

Table 2:

Location Code	Product ID	Required Quantity
ABCD001	AA12345678	100
ABCD001	AA12345678	401
ABCD002	AA12345678	19
EFGH001	BB12345678	200
EFGH002	BB12345678	20

Expected Result:

Location Code	Product ID	Required Quantity	Age days 1	Age days 2	Age days 3
ABCD001	AA12345678	100	120
ABCD001	AA12345678	401	120	111	110
ABCD002	AA12345678	19	110
EFGH001	BB12345678	200	120	109
EFGH002	BB12345678	20	109

The rule of product distribution is first come first served. For example, 'Location Code' ABCD001 require 100 qty. ('Product ID'= 'AA12345678')on row 2 in table 2. It will distribute 100 qty. ('Product ID'='AA12345678') to row 1 and get the 'Age days' to table 2. When the 'Quantity' on row 2 in table 1 is empty, it will lookup row 3 (with same 'Product ID'=AA12345678). The total number of 'Quantity' in table 1 is same as table 2.

I tried to use df2.merge(df1[['Product ID', 'Age Days']], 'left'), but 'Age Days' cannot merge to df2. And tried to use map function (df2['Age Days'] = df2['Product ID'].map(df1.set_index('Product ID')['Age Days'])), but it occur error "uniquely value".

Issue: 'Product ID' is non-unique for lookup/map/merge. How could get all results or its index by lookup/map/merge/other method? In that case, I need to set a flag called "is_empty" for checking, if "is_empty" == yes: I need to get the value from next matched row

I know that case is complex, could you let me know the way to solve it? I am confusing what keywords should I use to study for it. Thank you.

CodePudding user response：

After converting your comments into program logic, I ended up with the following :

table 1 file

  |Product ID|Inventory Receipt Date|Age Days|Quantity|
  |:---------|:---------------------|:-------|:-------|
  |AA12345678|Jan 21, 2022|120|400|
  |AA12345678|Jan 30, 2022|111|100| 
  |AA12345678|Jan 31, 2022|110|20| 
  |BB12345678|Jan 21, 2022|120|120| 
  |BB12345678|Feb 1, 2022|109|100|

table 2 file

 |Location Code|Product ID|Required Quantity|
  |-------------|----------|-----------------|
  |ABCD001|AA12345678|100| 
  |ABCD001|AA12345678|401| 
  |ABCD002|AA12345678|19| 
  |EFGH001|BB12345678|200| 
  |EFGH002|BB12345678|20|

code

import pandas as pd
import numpy as np

# convert first table from markdown to dataframe
df1 = pd.read_csv('table1', sep="|")
df1.drop(df1.columns[len(df1.columns)-1], axis=1, inplace=True)
df1.drop(df1.columns[0], axis=1, inplace=True)
df1 = df1.iloc[1:]
df1 = df1.reset_index()

# convert second table from markdown to dataframe
df2 = pd.read_csv('table2', sep="|")
df2.drop(df2.columns[len(df2.columns)-1], axis=1, inplace=True)
df2.drop(df2.columns[0], axis=1, inplace=True)
df2 = df2.iloc[1:]
df2 = df2.reset_index()

# Convert data type of quantities to integer
df1["Quantity"] = pd.to_numeric(df1["Quantity"])
df2["Required Quantity"] = pd.to_numeric(df2["Required Quantity"])

# create an extra column in df2
df2["Age 1"] = np.nan
df2_extra_col = 1

for i2, row2 in df2.iterrows():
    age_col = 1  # current age column
    product_id = row2["Product ID"]
    required_quantity = row2["Required Quantity"]

    # search for this product id in df1
    for i1, row1 in df1.iterrows():
        available_quantity = row1["Quantity"]

        if row1["Product ID"] == product_id:
            if available_quantity == 0:  # skip 0 quantity rows
                continue
            if available_quantity < required_quantity:  # insufficient quantity
                required_quantity -= available_quantity
                df1.loc[i1, "Quantity"] = 0  # take everything
                df2.loc[i2, "Age " str(age_col)] = row1["Age Days"]

                # add another column to df2 if missing
                age_col  = 1
                if(age_col > df2_extra_col):
                    df2["Age " str(age_col)] = np.nan
                    df2_extra_col  = 1

                continue

            else:  # single delivery enough
                df2.loc[i2, "Age " str(age_col)] = row1["Age Days"]
                df1.loc[i1, "Quantity"] -= required_quantity
                break

df2.drop(df2.columns[0], axis=1, inplace=True)
print(df2)

Result

  Location Code  Product ID  Required Quantity Age 1 Age 2 Age 3
0       ABCD001  AA12345678                100   120   NaN   NaN
1       ABCD001  AA12345678                401   120   111   110
2       ABCD002  AA12345678                 19   110   NaN   NaN
3       EFGH001  BB12345678                200   120   109   NaN
4       EFGH002  BB12345678                 20   109   NaN   NaN

Notes

I will highly recommend you to use single stepping and breakpoints to debug my code to understand what's happening on each line. Let me know if there's anything unclear.
My file contents were in markdown so there was some pre-processing to do to convert it to dataframe. If your files are already in csv, you can convert your files directly to a dataframe using df.read_csv.