I have below dataframe columns:
Index(['Location' 'Dec-2021_x', 'Jan-2022_x', 'Feb-2022_x', 'Mar-2022_x',
'Apr-2022_x', 'May-2022_x', 'Jun-2022_x', 'Jul-2022_x', 'Aug-2022_x',
'Sep-2022_x', 'Oct-2022_x', 'Nov-2022_x', 'Dec-2022_x', 'Jan-2023_x',
'Feb-2023_x', 'Mar-2023_x', 'Apr-2023_x', 'May-2023_x', 'Jun-2023_x',
'Jul-2023_x', 'Aug-2023_x', 'Sep-2023_x', 'Oct-2023_x', 'Nov-2023_x',
'Dec-2023_x', 'Jan-2024_x', 'Feb-2024_x', 'Mar-2024_x', 'Apr-2024_x',
'May-2024_x', 'Jun-2024_x', 'Jul-2024_x', 'Aug-2024_x', 'Sep-2024_x',
'Oct-2024_x', 'Nov-2024_x', 'Dec-2024_x',
'sum_val',
'Dec-2021_y', 'Jan-2022_y', 'Feb-2022_y',
'Mar-2022_y', 'Apr-2022_y', 'May-2022_y', 'Jun-2022_y', 'Jul-2022_y',
'Aug-2022_y', 'Sep-2022_y', 'Oct-2022_y', 'Nov-2022_y', 'Dec-2022_y',
'Jan-2023_y', 'Feb-2023_y', 'Mar-2023_y', 'Apr-2023_y', 'May-2023_y',
'Jun-2023_y', 'Jul-2023_y', 'Aug-2023_y', 'Sep-2023_y', 'Oct-2023_y',
'Nov-2023_y', 'Dec-2023_y', 'Jan-2024_y', 'Feb-2024_y', 'Mar-2024_y',
'Apr-2024_y', 'May-2024_y', 'Jun-2024_y', 'Jul-2024_y', 'Aug-2024_y',
'Sep-2024_y', 'Oct-2024_y', 'Nov-2024_y', 'Dec-2024_y'],
dtype='object')
Sample dataframe with reduced columns looks like this:
df:
Location Dec-2021_x Jan-2022_x sum_val Dec-2021_y Jan-2022_y
A 212 315 1000 12 13
B 312 612 1100 13 17
C 242 712 1010 15 15
D 215 382 1001 16 17
E 252 319 1110 17 18
I have to create a resultant dataframe which will be in the below format:
Index(['Location' 'Dec-2021', 'Jan-2022', 'Feb-2022', 'Mar-2022',
'Apr-2022', 'May-2022', 'Jun-2022', 'Jul-2022', 'Aug-2022',
'Sep-2022', 'Oct-2022', 'Nov-2022', 'Dec-2022', 'Jan-2023',
'Feb-2023', 'Mar-2023', 'Apr-2023', 'May-2023', 'Jun-2023',
'Jul-2023', 'Aug-2023', 'Sep-2023', 'Oct-2023', 'Nov-2023',
'Dec-2023', 'Jan-2024', 'Feb-2024', 'Mar-2024', 'Apr-2024',
'May-2024', 'Jun-2024', 'Jul-2024', 'Aug-2024', 'Sep-2024',
'Oct-2024', 'Nov-2024', 'Dec-2024'
dtype='object')
The way we do this is using the formula:
'Dec-2021' = 'Dec-2021_x' * sum_val * 'Dec-2021_y' (these are all numeric columns)
and a similar way for all the months. There are 36 months to be precise. Is there any way to do this in a loop manner for each column in the month-year combination? There are around 65000 rows here so do not want to overwhelm the system.
CodePudding user response:
Use:
#sample data
np.random.seed(2022)
c = ['Location', 'Dec-2021_x', 'Jan-2022_x', 'Feb-2022_x', 'Mar-2022_x',
'Apr-2022_x','sum_val', 'Dec-2021_y', 'Jan-2022_y', 'Feb-2022_y',
'Mar-2022_y', 'Apr-2022_y']
df = (pd.DataFrame(np.random.randint(10, size=(5, len(c))), columns=c)
.assign(Location=list('abcde')))
print (df)
Location Dec-2021_x Jan-2022_x Feb-2022_x Mar-2022_x Apr-2022_x \
0 a 1 1 0 7 8
1 b 8 0 3 6 8
2 c 1 7 5 5 4
3 d 0 7 5 5 8
4 e 8 0 3 9 5
sum_val Dec-2021_y Jan-2022_y Feb-2022_y Mar-2022_y Apr-2022_y
0 2 8 0 5 9 1
1 0 1 2 0 5 7
2 8 2 3 1 0 4
3 2 4 0 9 4 9
4 2 1 7 2 1 7
#remove unnecessary columns
df1 = df.drop(['sum_val'], axis=1)
#add columns names for not necessary remove - if need in ouput
df1 = df1.set_index('Location')
#split columns names by last _
df1.columns = df1.columns.str.rsplit('_', n=1, expand=True)
#seelct x and y Dataframes by second level and multiple
df2 = (df1.xs('x', axis=1, level=1).mul(df['sum_val'].to_numpy(), axis= 0) *
df1.xs('y', axis=1, level=1))
print (df2)
Dec-2021 Jan-2022 Feb-2022 Mar-2022 Apr-2022
Location
a 16 0 0 126 16
b 0 0 0 0 0
c 16 168 40 0 128
d 0 0 90 40 144
e 16 0 12 18 70