I have a huge csv file of dataframe. However, I don't have the date column. I only have the sales for every month from Jan-2022 until Dec-2034. Below is the example of my dataframe:
import pandas as pd
data = [[6661, 'Mobile Phone', 43578, 5000, 78564, 52353, 67456, 86965, 43634, 32546, 56332, 58944, 98878, 68588, 43634, 3463, 74533, 73733, 64436, 45426, 57333, 89762, 4373, 75457, 74845, 86843, 59957, 74563, 745335, 46342, 463473, 52352, 23622],
[6672, 'Play Station', 4475, 2546, 5757, 2352, 57896, 98574, 53536, 56533, 88645, 44884, 76585, 43575, 74573, 75347, 57573, 5736, 53737, 35235, 5322, 54757, 74573, 75473, 77362, 21554, 73462, 74736, 1435, 4367, 63462, 32362, 56332],
[6631, 'Laptop', 35347, 36376, 164577, 94584, 78675, 76758, 75464, 56373, 56343, 54787, 7658, 76584, 47347, 5748, 8684, 75373, 57573, 26626, 25632, 73774, 847373, 736646, 847457, 57346, 43732, 347346, 75373, 6473, 85674, 35743, 45734],
[6600, 'Camera', 14365, 60785, 25436, 46747, 75456, 97644, 63573, 56433, 25646, 32548, 14325, 64748, 68458, 46537, 7537, 46266, 7457, 78235, 46223, 8747, 67453, 4636, 3425, 4636, 352236, 6622, 64625, 36346, 46346, 35225, 6436],
[6643, 'Lamp', 324355, 143255, 696954, 97823, 43657, 66686, 56346, 57563, 65734, 64484, 87685, 54748, 9868, 573, 73472, 5735, 73422, 86352, 5325, 84333, 7473, 35252, 7547, 73733, 7374, 32266, 654747, 85743, 57333, 46346, 46266]]
ds = pd.DataFrame(data, columns = ['ID', 'Product', 'SalesJan-22', 'SalesFeb-22', 'SalesMar-22', 'SalesApr-22', 'SalesMay-22', 'SalesJun-22', 'SalesJul-22', 'SalesAug-22', 'SalesSep-22', 'SalesOct-22', 'SalesNov-22', 'SalesDec-22', 'SalesJan-23', 'SalesFeb-23', 'SalesMar-23', 'SalesApr-23', 'SalesMay-23', 'SalesJun-23', 'SalesJul-23', 'SalesAug-23', 'SalesSep-23', 'SalesOct-23', 'SalesNov-23', 'SalesDec-23', 'SalesJan-24', 'SalesFeb-24', 'SalesMar-24', 'SalesApr-24', 'SalesMay-24', 'SalesJun-24', 'SalesJul-24']
Since I have more than 10 monthly sales column, I want to loop the date after each of the month sales column. Then, the first 6 months will generate number 1, while the next 12 months will generate number 2, then another 12 months will generate number 3, another subsequent 12 months will generate number 4 and so on.
Below shows the sample of result that I want:
Is there any way to perform the loop and adding the date column beside each of the sales month?
CodePudding user response:
Here is the simplest approach I can think of:
for i, col in enumerate(ds.columns[2:]):
ds.insert(2 * i 2, col.removeprefix("Sales"), (i - 6) // 12 2)
CodePudding user response:
Here's a little solution : (I put the year unstead of your 1, 2, ... incrementation since i thought it is more representative, but you can change it easily)
idx_counter = 0
for idx, col in enumerate(ds.columns):
if col.startswith('Sales'):
date = col.replace('Sales', '')
year = col.split('-')[1]
ds.insert(loc=idx 1 idx_counter, column=date, value=[year] * ds.shape[0])
idx_counter = 1
output:
ID Product SalesJan-22 Jan-22 SalesFeb-22 Feb-22 SalesMar-22 Mar-22 SalesApr-22 Apr-22 ... SalesMar-24 Mar-24 SalesApr-24 Apr-24 SalesMay-24 May-24 SalesJun-24 Jun-24 SalesJul-24 Jul-24
0 6661 Mobile Phone 43578 22 5000 22 78564 22 52353 22 ... 745335 24 46342 24 463473 24 52352 24 23622 24
1 6672 Play Station 4475 22 2546 22 5757 22 2352 22 ... 1435 24 4367 24 63462 24 32362 24 56332 24
2 6631 Laptop 35347 22 36376 22 164577 22 94584 22 ... 75373 24 6473 24 85674 24 35743 24 45734 24
3 6600 Camera 14365 22 60785 22 25436 22 46747 22 ... 64625 24 36346 24 46346 24 35225 24 6436 24
4 6643 Lamp 324355 22 143255 22 696954 22 97823 22 ... 654747 24 85743 24 57333 24 46346 24 46266 24
CodePudding user response:
This should do the trick.
import math
new_cols = []
old_cols = [x for x in df.columns if x.startswith('Sales')]
for i, col in enumerate(old_cols):
new_cols.append(col[5:])
if i < 6:
val = 1
else:
val = ((i 6)/12) 1
df[col[5:]] = math.floor(val)
df[['ID', 'Product'] [x for y in zip(old_cols, new_cols) for x in y]]
CodePudding user response:
Here is a vectorial approach (using insert
repeatedly is inefficient):
# convert (valid) columns to datetime
cols = pd.to_datetime(ds.columns, format='Sales%b-%y', errors='coerce')
# identify valid dates
m = cols.notna()
# get year
y = cols[m].year
# calculate number (1 for first 6 months, then 1 per 12 months)
num = ((cols[m].month 12*(y-y.min())) 5)//12 1
# slice dates columns, assign the number, rename
df2 = (ds.loc[:, m].assign(**dict(zip(ds.columns[m], num)))
.rename(columns=lambda x: x[5:])
)
# get new order of columns
idx = np.r_[np.zeros((~m).sum()), np.tile(np.arange(m.sum()), 2) 1]
# concat and reorder
out = pd.concat([ds, df2], axis=1).iloc[:, np.argsort(idx)]
print(out)
output:
ID Product SalesJan-22 Jan-22 SalesFeb-22 Feb-22 SalesMar-22 Mar-22 SalesApr-22 Apr-22 SalesMay-22 May-22 SalesJun-22 Jun-22 SalesJul-22 Jul-22 SalesAug-22 Aug-22 Sep-22 SalesSep-22 Oct-22 SalesOct-22 SalesNov-22 Nov-22 Dec-22 SalesDec-22 Jan-23 SalesJan-23 Feb-23 SalesFeb-23 SalesMar-23 Mar-23 Apr-23 SalesApr-23 SalesMay-23 May-23 SalesJun-23 Jun-23 Jul-23 SalesJul-23 SalesAug-23 Aug-23 Sep-23 SalesSep-23 SalesOct-23 Oct-23 Nov-23 SalesNov-23 Dec-23 SalesDec-23 Jan-24 SalesJan-24 Feb-24 SalesFeb-24 Mar-24 SalesMar-24 Apr-24 SalesApr-24 May-24 SalesMay-24 SalesJun-24 Jun-24 SalesJul-24 Jul-24
0 6661 Mobile Phone 43578 1 5000 1 78564 1 52353 1 67456 1 86965 1 43634 2 32546 2 2 56332 2 58944 98878 2 2 68588 2 43634 2 3463 74533 2 2 73733 64436 2 45426 2 3 57333 89762 3 3 4373 75457 3 3 74845 3 86843 3 59957 3 74563 3 745335 3 46342 3 463473 52352 3 23622 4
1 6672 Play Station 4475 1 2546 1 5757 1 2352 1 57896 1 98574 1 53536 2 56533 2 2 88645 2 44884 76585 2 2 43575 2 74573 2 75347 57573 2 2 5736 53737 2 35235 2 3 5322 54757 3 3 74573 75473 3 3 77362 3 21554 3 73462 3 74736 3 1435 3 4367 3 63462 32362 3 56332 4
2 6631 Laptop 35347 1 36376 1 164577 1 94584 1 78675 1 76758 1 75464 2 56373 2 2 56343 2 54787 7658 2 2 76584 2 47347 2 5748 8684 2 2 75373 57573 2 26626 2 3 25632 73774 3 3 847373 736646 3 3 847457 3 57346 3 43732 3 347346 3 75373 3 6473 3 85674 35743 3 45734 4
3 6600 Camera 14365 1 60785 1 25436 1 46747 1 75456 1 97644 1 63573 2 56433 2 2 25646 2 32548 14325 2 2 64748 2 68458 2 46537 7537 2 2 46266 7457 2 78235 2 3 46223 8747 3 3 67453 4636 3 3 3425 3 4636 3 352236 3 6622 3 64625 3 36346 3 46346 35225 3 6436 4
4 6643 Lamp 324355 1 143255 1 696954 1 97823 1 43657 1 66686 1 56346 2 57563 2 2 65734 2 64484 87685 2 2 54748 2 9868 2 573 73472 2 2 5735 73422 2 86352 2 3 5325 84333 3 3 7473 35252 3 3 7547 3 73733 3 7374 3 32266 3 654747 3 85743 3 57333 46346 3 46266 4