sum of bar code-locations combination along a timeline-CodePudding

I have 2 tables: 1) "first_day" which contains date, location, barcode(sku) and inventory(quantity). location is a store and sku is an item. There is no duplicates among location-sku combination. date represents only one date 01/01/2020. 2) "transactions" which contains date, location, sku, transaction type, quantity. dates are from 01/02/2020 onwards. transaction types can be "in", "out" and "sale" which i converted to 1, -1, -1 respectively and then multiplied with quantity column to create "total" column.

first_day table

index	date	location	sku	inventory
0	01/01/20	02	2518	0
1	01/01/20	105	425	1
2	01/01/20	02	667	1

transaction table

index	date	location	sku	total
0	01/02/20	02	2518	2
1	01/03/20	02	2518	-2
2	01/04/20	02	2518	8
3	01/05/20	02	2518	-3
4	01/02/20	02	667	12
5	01/03/20	02	667	4
6	01/05/20	02	667	-2
7	01/06/20	02	667	3

I use first_day table for the unique locations and unique sku. And for the initial quantity of each item from each location. My mission is to iterate over transaction table and append or subtract the quantity to/from the initial quantity from first_day table. result table for example:

index	location	sku	date	quantity
0	02	2518	01/02/20	2 # 2
1	02	2518	01/03/20	0 # -2
2	02	2518	01/04/20	8 # 8
3	02	2518	01/05/20	5 # -3
4	02	667	01/02/20	13 # 12
5	02	667	01/03/20	17 # 4
6	02	667	01/05/20	15 # -2
7	02	667	01/06/20	18 # 3

My code is not working well, I only get one line that includes the first location from locations, first sku from skus, quantity of zero and date 02/01/2020.

first_day = pd.read_csv(r"C:\Users\Inventory_First_day.csv")
master = pd.read_csv(r"C:\Users\Master_skus.csv")
transactions = pd.read_csv(r"C:\Users\Transactions.csv")

transactions['Transaction Type'] = transactions['Transaction Type'].replace({'SALE': -1, 'OUT': -1, 'IN': 1})
transactions['total'] = transactions['Transaction Type'] * transactions['Quantity']

locations = np.unique(first_day['Stock Location Name'])
skus = np.unique(first_day['SKU'])

def skulocation(first_day, transactions, loc, sk):
    
    for location in locations:
        for sku in skus:
            data = first_day[first_day['Stock Location Name'] == location] #filtering for location
            data = data[data['SKU'] == sku]                  #filtering for sku
            if data.empty:                                   #some combintaion not exists in first_day
                data.loc[0,'Date'] = first_day['Date'].iloc[0]     
                data.loc[0,'Stock Location Name'] = location
                data.loc[0,'SKU'] = skus[0]
                data.loc[0,'Inventory'] = 0
            else:
                pass
            
            data2 = transactions[transactions['Stock Location Name'] == location]    #filtering for location
            data2 = data2[data2['SKU'] == sku][['Date', 'Stock Location Name', 'SKU', 'total']] #filtering for sku
            if data2.empty:                                       #some combintaion not exists in transactions
                data2.loc[0,'Date'] = transactions['Date'].iloc[0]
                data2.loc[0,'Stock Location Name'] = location
                data2.loc[0,'SKU'] = sku
                data2.loc[0,'total'] = 0
            else:
                pass
            new_frame = pd.DataFrame(columns=['location', 'sku','date', 'quantity']) # create new dataframe
            new_frame.loc[0,'quantity'] = data['Inventory'].iloc[0] #initializing with value from first_day 
            for i in range(len(data2)):
                new_frame.loc[i,'location'] = data2['Stock Location Name'].iloc[i]
                new_frame.loc[i, 'sku'] = data2['SKU'].iloc[i]
                new_frame.loc[i,'date'] = data2['Date'].iloc[i]
                new_frame.loc[i, 'quantity'] = new_frame['quantity'].iloc[i]   data2['total'].iloc[i] #add/substract quantity
    return new_frame

new_frame = skulocation(first_day, transactions, locations, skus)

Thank you guys

P.S. i edit a little bit the result table. from the first_day table i need only the quantity of each item at a different location. for example i have chair (item 1) in the following locations: A, B, D, J, Q. each location has different amount from this chair. Now i go to the transaction table, and for each locations i follow how the amount of this chair has changed over time.

let's say my starting point is -5. i have array with values, each place of value represents a date. [3, 1, -6, 8]. at first day i have -5 3 =-2. then -2 1= -1, then -1 -6= -7, then -7 8= 1. [-2,-1,-7, 1]

CodePudding user response：

Is this what you're looking for? If not, please add more data. I added some data to the transaction table based on what I understood your ask to be.

import io

data1='''index  date    location    sku inventory(quantity)
0   01/01/20    02  2518    0
1   01/01/20    105 425 1'''
df_first_day = pd.read_csv(io.StringIO(data1), sep=' \s ', engine='python')
df_first_day

data2='''
index   date    location    sku transaction type    quantity
0   01/02/20    12  2518    sale    14
1   01/02/20    145 4579    in  5
2   01/04/20    02  2518    out 1
3   01/05/20    02  2518    in  20
4   01/06/20    02  2518    in  1
5   01/08/20    02  2518    sale    5
6   01/10/20    02  2518    out 1
7   01/12/20    02  2518    sale    3
'''
df_trx = pd.read_csv(io.StringIO(data2), sep=' \s ', engine='python')
df_trx['inventory(quantity)'] = np.where(df_trx['transaction type']=='in', df_trx['quantity'], -df_trx['quantity'])
df_trx

df_adj = pd.concat([df_first_day, df_trx[['index', 'date', 'location', 'sku', 'inventory(quantity)']]]).sort_values(['sku', 'location', 'date'])
df_adj['quantity'] = df_adj.groupby(['sku', 'location'])['inventory(quantity)'].cumsum()
df_adj

   index      date  location   sku  inventory(quantity)  quantity
1      1  01/01/20       105   425                    1         1
0      0  01/01/20         2  2518                    0         0
2      2  01/04/20         2  2518                   -1        -1
3      3  01/05/20         2  2518                   20        19
4      4  01/06/20         2  2518                    1        20
5      5  01/08/20         2  2518                   -5        15
6      6  01/10/20         2  2518                   -1        14
7      7  01/12/20         2  2518                   -3        11
0      0  01/02/20        12  2518                  -14       -14
1      1  01/02/20       145  4579                    5         5

Extra If you are looking for just current inventory, you can do something like this

df_current_inv = df_adj.groupby(['sku', 'location']).last().reset_index()

CodePudding user response：

I think the problem was specifically the updating of the new_frame variable:

def skulocation(df, df2, loc, sk, new_frame = None, **kwargs):
    
    for location in locations:
        for sku in skus:
            data = df[df['Stock Location Name'] == location] #filtering for location
            data = data[data['SKU'] == sku]                  #filtering for sku
            if data.empty:                                   #some combintaion not exists in first_day
                data.loc[0,'Date'] = df['Date'].iloc[0]     
                data.loc[0,'Stock Location Name'] = location
                data.loc[0,'SKU'] = skus[0]
                data.loc[0,'Inventory'] = 0
            else:
                pass
            
            data2 = df2[df2['Stock Location Name'] == location]    #filtering for location
            data2 = data2[data2['SKU'] == sku][['Date', 'Stock Location Name', 'SKU', 'total']] #filtering for sku
            if data2.empty:                                       #some combintaion not exists in transactions
                data2.loc[0,'Date'] = df2['Date'].iloc[0]
                data2.loc[0,'Stock Location Name'] = location
                data2.loc[0,'SKU'] = sku
                data2.loc[0,'total'] = 0
            else:
                pass
            column_names = ['location', 'sku','date', 'quantity']
            if new_frame is None:
              new_frame = pd.DataFrame({column_names[0]: data2['Stock Location Name'].iloc[0], column_names[1]: data2['SKU'].iloc[0], column_names[2]: data2['Date'].iloc[0], column_names[3]: data2['Inventory'].iloc[0]})
            else:
              i = len(data2)
              new_frame = pd.DataFrame({column_names[0]: data2['Stock Location Name'].iloc[0:i], column_names[1]: data2['SKU'].iloc[0:i], column_names[2]: data2['Date'].iloc[0:i], column_names[3]: str(int(new_frame[column_names[3]].iloc[i])   int(data2['total'].iloc[i]))})
    return new_frame

new_frame = skulocation(first_day, transactions, locations, skus)

CodePudding user response：

A very simple solution is to concatenate your two dataframes and adding a new index level. Then apply cumsum for each group and keep only transaction level:

out = pd.concat([df1.rename(columns={'inventory': 'total'}), df2], 
                 keys=['inventory', 'transaction'])

out = out.assign(total=out.groupby(['location', 'sku']).cumsum()).loc['transaction']

Output:

       date  location   sku  total
0  01/02/20         2  2518      2
1  01/03/20         2  2518      0
2  01/04/20         2  2518      8
3  01/05/20         2  2518      5
4  01/02/20         2   667     13
5  01/03/20         2   667     17
6  01/05/20         2   667     15
7  01/06/20         2   667     18

Output after pd.concat:

>>> out
                   date  location   sku  total
inventory   0  01/01/20         2  2518      0  # From 1st dataframe
            1  01/01/20       105   425      1
            2  01/01/20         2   667      1
transaction 0  01/02/20         2  2518      2  # From 2nd dataframe
            1  01/03/20         2  2518     -2
            2  01/04/20         2  2518      8
            3  01/05/20         2  2518     -3
            4  01/02/20         2   667     12
            5  01/03/20         2   667      4
            6  01/05/20         2   667     -2
            7  01/06/20         2   667      3