Loop every End of Month and Adding the Column every Monthly Sales Column Pandas-CodePudding

In my dataframe, I don't have the date column. I only have the sales for every month from Jan-2022 until Dec-2034. Below is the example of my dataframe:

import pandas as pd 

data = [[6661, 'Mobile Phone', 43578, 5000, 78564, 52353], [6672, 'Play Station', 4475, 2546, 5757, 2352], 
        [6631, 'Personal Computer', 35347, 36376, 164577, 94584], [6600, 'Camera', 14365, 60785, 25436, 46747], 
        [6643, 'Lamp', 324355, 143255, 696954, 97823]]
ds = pd.DataFrame(data, columns = ['ID', 'Product', 'Sales_Jan-22', 'Sales_Feb-22', 'Sales_Mac-22', 'Sales_Apr-22'])

Since I have more than 10 monthly sales column, I want to loop the date after each of the month sales column. Below shows the sample of result that I want:

It's easy if the column is just less than 10. However, since the monthly would be from Jan-22 until Dec-2034 (quite a lot), is there any way to perform the loop and adding the date column beside each of the sales month?

CodePudding user response：

Why don't you create a Date column? It will be a lot easier to filter the data afterwards to get how many of a given product was sold in a month:

import pandas as pd 
from datetime import datetime

data = [[6661, 'Mobile Phone', 43578, 5000, 78564, 52353], [6672, 'Play Station', 4475, 2546, 5757, 2352], 
        [6631, 'Personal Computer', 35347, 36376, 164577, 94584], [6600, 'Camera', 14365, 60785, 25436, 46747], 
        [6643, 'Lamp', 324355, 143255, 696954, 97823]]
ds = pd.DataFrame(data, columns = ['ID', 'Product', 'Sales_Jan-22', 'Sales_Feb-22', 'Sales_Mar-22', 'Sales_Apr-22'])

month_cols = ds.columns[ds.columns.str.contains('Sales')] # get all the months and years from the column names
m_y = [datetime.strptime(x[-6:], '%b-%y') for x in month_cols] # convert to datetime format

# Create a new dataframe with a 'Date' column

df = ds[['ID', 'Product']]
all_months = []

for i, d in enumerate(m_y):
    df.loc[:, 'Date'] = d
    df.loc[:, 'Sales'] = ds.loc[:, month_cols[i]]
    all_months.append(df)

final_df = pd.concat(all_months).groupby(['Date', 'ID', 'Product']).sum()
final_df

ID  Product                     Date        Sales
0   6661    Mobile Phone        2022-04-01  52353
1   6672    Play Station        2022-04-01  2352
2   6631    Personal Computer   2022-04-01  94584
3   6600    Camera              2022-04-01  46747
4   6643    Lamp                2022-04-01  97823
0   6661    Mobile Phone        2022-04-01  52353
1   6672    Play Station        2022-04-01  2352
2   6631    Personal Computer   2022-04-01  94584
3   6600    Camera              2022-04-01  46747
4   6643    Lamp                2022-04-01  97823
0   6661    Mobile Phone        2022-04-01  52353
1   6672    Play Station        2022-04-01  2352
2   6631    Personal Computer   2022-04-01  94584
3   6600    Camera              2022-04-01  46747
4   6643    Lamp                2022-04-01  97823
0   6661    Mobile Phone        2022-04-01  52353
1   6672    Play Station        2022-04-01  2352
2   6631    Personal Computer   2022-04-01  94584
3   6600    Camera              2022-04-01  46747
4   6643    Lamp                2022-04-01  97823

CodePudding user response：

Updating the answer following @fhaney comment.

If all you want is to place a new column after each sales column, you can do:

def insert_col_after_sales(df, sales_col, new_col):
    cols = df.columns.to_list()
    i = df.columns.get_loc(sales_col)
    cols.insert(i   1, new_col)
    df[new_col] = ''
    df = df.reindex(columns=cols)
    df.loc[:, new_col] = datetime.strptime(new_col, '%b-%y')
    return df

for i, d in enumerate(m_y):
    sales_col = sales[i]
    df = insert_col_after_sales(df, sales_col, d)

output:

ID	Product	Sales_Jan-22	Jan-22	Sales_Feb-22	Feb-22	Sales_Mar-22	Mar-22	Sales_Apr-22	Apr-22
0	6661	Mobile Phone	43578	2022-01-01	5000	2022-02-01	78564	2022-03-01	52353
1	6672	Play Station	4475	2022-01-01	2546	2022-02-01	5757	2022-03-01	2352
2	6631	Personal Computer	35347	2022-01-01	36376	2022-02-01	164577	2022-03-01	94584
3	6600	Camera	14365	2022-01-01	60785	2022-02-01	25436	2022-03-01	46747
4	6643	Lamp	324355	2022-01-01	143255	2022-02-01	696954	2022-03-01	97823

CodePudding user response：

Another alternative (works only with corrected Mac-22 column name):

values = (pd.Series(pd.to_datetime(ds.columns[2:].str[-6:], format="%b-%y"))
            .dt.to_period("M").dt.strftime("%d-%m-%Y"))

result = pd.concat(
    [ds[ds.columns[:2]]]   [
        ds[[col]].assign(**{col[-6:]: value})
        for col, value in zip(ds.columns[2:], values)
    ],
    axis="columns"
)

Result:

     ID            Product  Sales_Jan-22      Jan-22  Sales_Feb-22  \
0  6661       Mobile Phone         43578  31-01-2022          5000   
1  6672       Play Station          4475  31-01-2022          2546   
2  6631  Personal Computer         35347  31-01-2022         36376   
3  6600             Camera         14365  31-01-2022         60785   
4  6643               Lamp        324355  31-01-2022        143255   

       Feb-22  Sales_Mar-22      Mar-22  Sales_Apr-22      Apr-22  
0  28-02-2022         78564  31-03-2022         52353  30-04-2022  
1  28-02-2022          5757  31-03-2022          2352  30-04-2022  
2  28-02-2022        164577  31-03-2022         94584  30-04-2022  
3  28-02-2022         25436  31-03-2022         46747  30-04-2022  
4  28-02-2022        696954  31-03-2022         97823  30-04-2022