Home > other >  Create new column with date corresponding to whether a quantity is greater than or equal to another
Create new column with date corresponding to whether a quantity is greater than or equal to another

Time:07-08

The first dataframe shows qty_ordered, representing the quantity of units each customer ordered, and qty_needed, which cumulatively sums the quantity of units after each row to determine how many units must be fulfilled before one particular customer can receive their order. Example: Row 4 who ordered 1, has 7 units ahead of them before they can receive their 1 unit.

qty_ordered   qty_needed
2             2
3             5
1             6
1             7
4             11
3             14
1             15
3             18
20            38

The second dataframe shows shipped_qty, the quantity of units that is shipped on a particular date, ship_date.

shipped_qty   ship_date
10            1/20/2022
24            2/20/2022
42            3/20/2022

I want to combine these datasets by adding a column that displays the ship date of where shipped_qty is greater than or equal to the qty_needed.

qty_ordered   qty_needed  ship_date
2             2           1/20/2022
3             5           1/20/2022
1             6           1/20/2022
1             7           1/20/2022
4             11          2/20/2022
3             14          2/20/2022
1             15          2/20/2022
3             18          2/20/2022
20            38          3/20/2022

I first tried to use a simple for loop statement, but this just return all of the ship dates and not the ones I need. In Excel, I would've done an IF statement, if qty_needed <= shipped_qty, return ship_date.

CodePudding user response:

You can use pandas.merge_asof for this kind of task. Assuming that your DataFrame of "qty_ordered" and "qty_needed" is called df and the DataFrame of "shipped_qty" is called df_dates, the following line of code will do the trick:

pd.merge_asof(df, df_dates, left_on='qty_needed', right_on='shipped_qty', direction='forward')

CodePudding user response:

here is one way to do it

from pandasql import sqldf

# https://pypi.org/project/pandasql/
pysqldf = lambda q: sqldf(q, globals())

q = """
     SELECT
        df.qty_ordered, df.qty_needed, df2.ship_date
     FROM
        df
     LEFT JOIN
        df2
     WHERE df.qty_needed < df2.shipped_qty
"""
df3=pysqldf(q)
df3.drop_duplicates(subset=['qty_ordered','qty_needed'], keep='first')

df_final = df3.drop_duplicates(subset=['qty_ordered','qty_needed'], keep='first')

    qty_ordered     qty_needed  ship_date
0             2             2   1/20/2022
3             3             5   1/20/2022
6             1             6   1/20/2022
9             1             7   1/20/2022
12            4            11   2/20/2022
14            3            14   2/20/2022
16            1            15   2/20/2022
18            3            18   2/20/2022
20           20            38   3/20/2022
  • Related