The first dataframe shows qty_ordered, representing the quantity of units each customer ordered, and qty_needed, which cumulatively sums the quantity of units after each row to determine how many units must be fulfilled before one particular customer can receive their order. Example: Row 4 who ordered 1, has 7 units ahead of them before they can receive their 1 unit.
qty_ordered qty_needed
2 2
3 5
1 6
1 7
4 11
3 14
1 15
3 18
20 38
The second dataframe shows shipped_qty, the quantity of units that is shipped on a particular date, ship_date.
shipped_qty ship_date
10 1/20/2022
24 2/20/2022
42 3/20/2022
I want to combine these datasets by adding a column that displays the ship date of where shipped_qty is greater than or equal to the qty_needed.
qty_ordered qty_needed ship_date
2 2 1/20/2022
3 5 1/20/2022
1 6 1/20/2022
1 7 1/20/2022
4 11 2/20/2022
3 14 2/20/2022
1 15 2/20/2022
3 18 2/20/2022
20 38 3/20/2022
I first tried to use a simple for loop statement, but this just return all of the ship dates and not the ones I need. In Excel, I would've done an IF statement, if qty_needed <= shipped_qty, return ship_date.
CodePudding user response:
You can use pandas.merge_asof
for this kind of task. Assuming that your DataFrame of "qty_ordered" and "qty_needed" is called df
and the DataFrame of "shipped_qty" is called df_dates
, the following line of code will do the trick:
pd.merge_asof(df, df_dates, left_on='qty_needed', right_on='shipped_qty', direction='forward')
CodePudding user response:
here is one way to do it
from pandasql import sqldf
# https://pypi.org/project/pandasql/
pysqldf = lambda q: sqldf(q, globals())
q = """
SELECT
df.qty_ordered, df.qty_needed, df2.ship_date
FROM
df
LEFT JOIN
df2
WHERE df.qty_needed < df2.shipped_qty
"""
df3=pysqldf(q)
df3.drop_duplicates(subset=['qty_ordered','qty_needed'], keep='first')
df_final = df3.drop_duplicates(subset=['qty_ordered','qty_needed'], keep='first')
qty_ordered qty_needed ship_date
0 2 2 1/20/2022
3 3 5 1/20/2022
6 1 6 1/20/2022
9 1 7 1/20/2022
12 4 11 2/20/2022
14 3 14 2/20/2022
16 1 15 2/20/2022
18 3 18 2/20/2022
20 20 38 3/20/2022