I am new learner for python. I have two dataframe loaded from xlsx file in python, and tried to get the value to table 2 from table 1
Table 1:
Product ID | Inventory Receipt Date | Age Days | Quantity |
---|---|---|---|
AA12345678 | Jan 21, 2022 | 120 | 400 |
AA12345678 | Jan 30, 2022 | 111 | 100 |
AA12345678 | Jan 31, 2022 | 110 | 20 |
BB12345678 | Jan 21, 2022 | 120 | 120 |
BB12345678 | Feb 1, 2022 | 109 | 100 |
Table 2:
Location Code | Product ID | Required Quantity |
---|---|---|
ABCD001 | AA12345678 | 100 |
ABCD001 | AA12345678 | 401 |
ABCD002 | AA12345678 | 19 |
EFGH001 | BB12345678 | 200 |
EFGH002 | BB12345678 | 20 |
Expected Result:
Location Code | Product ID | Required Quantity | Age days 1 | Age days 2 | Age days 3 |
---|---|---|---|---|---|
ABCD001 | AA12345678 | 100 | 120 | ||
ABCD001 | AA12345678 | 401 | 120 | 111 | 110 |
ABCD002 | AA12345678 | 19 | 110 | ||
EFGH001 | BB12345678 | 200 | 120 | 109 | |
EFGH002 | BB12345678 | 20 | 109 |
The rule of product distribution is first come first served. For example, 'Location Code' ABCD001 require 100 qty. ('Product ID'= 'AA12345678')on row 2 in table 2. It will distribute 100 qty. ('Product ID'='AA12345678') to row 1 and get the 'Age days' to table 2. When the 'Quantity' on row 2 in table 1 is empty, it will lookup row 3 (with same 'Product ID'=AA12345678). The total number of 'Quantity' in table 1 is same as table 2.
I tried to use df2.merge(df1[['Product ID', 'Age Days']], 'left'), but 'Age Days' cannot merge to df2. And tried to use map function (df2['Age Days'] = df2['Product ID'].map(df1.set_index('Product ID')['Age Days'])), but it occur error "uniquely value".
Issue: 'Product ID' is non-unique for lookup/map/merge. How could get all results or its index by lookup/map/merge/other method? In that case, I need to set a flag called "is_empty" for checking, if "is_empty" == yes: I need to get the value from next matched row
I know that case is complex, could you let me know the way to solve it? I am confusing what keywords should I use to study for it. Thank you.
CodePudding user response:
After converting your comments into program logic, I ended up with the following :
table 1 file
|Product ID|Inventory Receipt Date|Age Days|Quantity|
|:---------|:---------------------|:-------|:-------|
|AA12345678|Jan 21, 2022|120|400|
|AA12345678|Jan 30, 2022|111|100|
|AA12345678|Jan 31, 2022|110|20|
|BB12345678|Jan 21, 2022|120|120|
|BB12345678|Feb 1, 2022|109|100|
table 2 file
|Location Code|Product ID|Required Quantity|
|-------------|----------|-----------------|
|ABCD001|AA12345678|100|
|ABCD001|AA12345678|401|
|ABCD002|AA12345678|19|
|EFGH001|BB12345678|200|
|EFGH002|BB12345678|20|
code
import pandas as pd
import numpy as np
# convert first table from markdown to dataframe
df1 = pd.read_csv('table1', sep="|")
df1.drop(df1.columns[len(df1.columns)-1], axis=1, inplace=True)
df1.drop(df1.columns[0], axis=1, inplace=True)
df1 = df1.iloc[1:]
df1 = df1.reset_index()
# convert second table from markdown to dataframe
df2 = pd.read_csv('table2', sep="|")
df2.drop(df2.columns[len(df2.columns)-1], axis=1, inplace=True)
df2.drop(df2.columns[0], axis=1, inplace=True)
df2 = df2.iloc[1:]
df2 = df2.reset_index()
# Convert data type of quantities to integer
df1["Quantity"] = pd.to_numeric(df1["Quantity"])
df2["Required Quantity"] = pd.to_numeric(df2["Required Quantity"])
# create an extra column in df2
df2["Age 1"] = np.nan
df2_extra_col = 1
for i2, row2 in df2.iterrows():
age_col = 1 # current age column
product_id = row2["Product ID"]
required_quantity = row2["Required Quantity"]
# search for this product id in df1
for i1, row1 in df1.iterrows():
available_quantity = row1["Quantity"]
if row1["Product ID"] == product_id:
if available_quantity == 0: # skip 0 quantity rows
continue
if available_quantity < required_quantity: # insufficient quantity
required_quantity -= available_quantity
df1.loc[i1, "Quantity"] = 0 # take everything
df2.loc[i2, "Age " str(age_col)] = row1["Age Days"]
# add another column to df2 if missing
age_col = 1
if(age_col > df2_extra_col):
df2["Age " str(age_col)] = np.nan
df2_extra_col = 1
continue
else: # single delivery enough
df2.loc[i2, "Age " str(age_col)] = row1["Age Days"]
df1.loc[i1, "Quantity"] -= required_quantity
break
df2.drop(df2.columns[0], axis=1, inplace=True)
print(df2)
Result
Location Code Product ID Required Quantity Age 1 Age 2 Age 3
0 ABCD001 AA12345678 100 120 NaN NaN
1 ABCD001 AA12345678 401 120 111 110
2 ABCD002 AA12345678 19 110 NaN NaN
3 EFGH001 BB12345678 200 120 109 NaN
4 EFGH002 BB12345678 20 109 NaN NaN
Notes
- I will highly recommend you to use single stepping and breakpoints to debug my code to understand what's happening on each line. Let me know if there's anything unclear.
- My file contents were in markdown so there was some pre-processing to do to convert it to dataframe. If your files are already in csv, you can convert your files directly to a dataframe using
df.read_csv
.