Home > Back-end >  Loop every End of Month and Adding the Column every Monthly Sales Column Pandas
Loop every End of Month and Adding the Column every Monthly Sales Column Pandas

Time:03-16

In my dataframe, I don't have the date column. I only have the sales for every month from Jan-2022 until Dec-2034. Below is the example of my dataframe:

import pandas as pd 

data = [[6661, 'Mobile Phone', 43578, 5000, 78564, 52353], [6672, 'Play Station', 4475, 2546, 5757, 2352], 
        [6631, 'Personal Computer', 35347, 36376, 164577, 94584], [6600, 'Camera', 14365, 60785, 25436, 46747], 
        [6643, 'Lamp', 324355, 143255, 696954, 97823]]
ds = pd.DataFrame(data, columns = ['ID', 'Product', 'Sales_Jan-22', 'Sales_Feb-22', 'Sales_Mac-22', 'Sales_Apr-22'])

enter image description here

Since I have more than 10 monthly sales column, I want to loop the date after each of the month sales column. Below shows the sample of result that I want:

enter image description here

It's easy if the column is just less than 10. However, since the monthly would be from Jan-22 until Dec-2034 (quite a lot), is there any way to perform the loop and adding the date column beside each of the sales month?

CodePudding user response:

Why don't you create a Date column? It will be a lot easier to filter the data afterwards to get how many of a given product was sold in a month:

import pandas as pd 
from datetime import datetime

data = [[6661, 'Mobile Phone', 43578, 5000, 78564, 52353], [6672, 'Play Station', 4475, 2546, 5757, 2352], 
        [6631, 'Personal Computer', 35347, 36376, 164577, 94584], [6600, 'Camera', 14365, 60785, 25436, 46747], 
        [6643, 'Lamp', 324355, 143255, 696954, 97823]]
ds = pd.DataFrame(data, columns = ['ID', 'Product', 'Sales_Jan-22', 'Sales_Feb-22', 'Sales_Mar-22', 'Sales_Apr-22'])

month_cols = ds.columns[ds.columns.str.contains('Sales')] # get all the months and years from the column names
m_y = [datetime.strptime(x[-6:], '%b-%y') for x in month_cols] # convert to datetime format

# Create a new dataframe with a 'Date' column

df = ds[['ID', 'Product']]
all_months = []

for i, d in enumerate(m_y):
    df.loc[:, 'Date'] = d
    df.loc[:, 'Sales'] = ds.loc[:, month_cols[i]]
    all_months.append(df)

final_df = pd.concat(all_months).groupby(['Date', 'ID', 'Product']).sum()
final_df

ID  Product                     Date        Sales
0   6661    Mobile Phone        2022-04-01  52353
1   6672    Play Station        2022-04-01  2352
2   6631    Personal Computer   2022-04-01  94584
3   6600    Camera              2022-04-01  46747
4   6643    Lamp                2022-04-01  97823
0   6661    Mobile Phone        2022-04-01  52353
1   6672    Play Station        2022-04-01  2352
2   6631    Personal Computer   2022-04-01  94584
3   6600    Camera              2022-04-01  46747
4   6643    Lamp                2022-04-01  97823
0   6661    Mobile Phone        2022-04-01  52353
1   6672    Play Station        2022-04-01  2352
2   6631    Personal Computer   2022-04-01  94584
3   6600    Camera              2022-04-01  46747
4   6643    Lamp                2022-04-01  97823
0   6661    Mobile Phone        2022-04-01  52353
1   6672    Play Station        2022-04-01  2352
2   6631    Personal Computer   2022-04-01  94584
3   6600    Camera              2022-04-01  46747
4   6643    Lamp                2022-04-01  97823

CodePudding user response:

Updating the answer following @fhaney comment.

If all you want is to place a new column after each sales column, you can do:

def insert_col_after_sales(df, sales_col, new_col):
    cols = df.columns.to_list()
    i = df.columns.get_loc(sales_col)
    cols.insert(i   1, new_col)
    df[new_col] = ''
    df = df.reindex(columns=cols)
    df.loc[:, new_col] = datetime.strptime(new_col, '%b-%y')
    return df

for i, d in enumerate(m_y):
    sales_col = sales[i]
    df = insert_col_after_sales(df, sales_col, d)

output:

ID Product Sales_Jan-22 Jan-22 Sales_Feb-22 Feb-22 Sales_Mar-22 Mar-22 Sales_Apr-22 Apr-22
0 6661 Mobile Phone 43578 2022-01-01 5000 2022-02-01 78564 2022-03-01 52353
1 6672 Play Station 4475 2022-01-01 2546 2022-02-01 5757 2022-03-01 2352
2 6631 Personal Computer 35347 2022-01-01 36376 2022-02-01 164577 2022-03-01 94584
3 6600 Camera 14365 2022-01-01 60785 2022-02-01 25436 2022-03-01 46747
4 6643 Lamp 324355 2022-01-01 143255 2022-02-01 696954 2022-03-01 97823

CodePudding user response:

Another alternative (works only with corrected Mac-22 column name):

values = (pd.Series(pd.to_datetime(ds.columns[2:].str[-6:], format="%b-%y"))
            .dt.to_period("M").dt.strftime("%d-%m-%Y"))

result = pd.concat(
    [ds[ds.columns[:2]]]   [
        ds[[col]].assign(**{col[-6:]: value})
        for col, value in zip(ds.columns[2:], values)
    ],
    axis="columns"
)

Result:

     ID            Product  Sales_Jan-22      Jan-22  Sales_Feb-22  \
0  6661       Mobile Phone         43578  31-01-2022          5000   
1  6672       Play Station          4475  31-01-2022          2546   
2  6631  Personal Computer         35347  31-01-2022         36376   
3  6600             Camera         14365  31-01-2022         60785   
4  6643               Lamp        324355  31-01-2022        143255   

       Feb-22  Sales_Mar-22      Mar-22  Sales_Apr-22      Apr-22  
0  28-02-2022         78564  31-03-2022         52353  30-04-2022  
1  28-02-2022          5757  31-03-2022          2352  30-04-2022  
2  28-02-2022        164577  31-03-2022         94584  30-04-2022  
3  28-02-2022         25436  31-03-2022         46747  30-04-2022  
4  28-02-2022        696954  31-03-2022         97823  30-04-2022  
  • Related