I have a pandas dataframe with columns names as ['INV01_M1_I', 'INV01_M1_V', 'INV01_M2_I', 'INV01_M2_V', 'INV02_M1_I', 'INV02_M1_V', 'INV02_M2_I', 'INV02_M2_V'....] AND SO ON. I want to sum those columns which have same 'INV_no_here' and the last character i.e. I or V. That is sum INV01_M1_I INVO1_M2_I in one column and INV02_M1_I INV02_M2_I in one column(if i can name them there it will be nice to do so). I have almost 100 columns where number changes from 01 to the end for INV. I have gone through different answers where regex, filter(like=), and other different solutions are provided. But I need to match first 5 characters and last character and then also sum those columns.
import numpy as np
import pandas pd
data = [[20, 10, 13, 16, 18, 20, 9, 6], [7, 15, 11, 16, 27, 7, 19, 10]]
df = pd.DataFrame(data, columns=['INV01_M1_I', 'INV01_M1_V','INV01_M2_I','INV01_M2_V',
'INV02_M1_I','INV02_M1_V','INV02_M2_I','INV02_M2_V'])
print(df)
CodePudding user response:
here is one way :
for cols in df.columns.str.split('_'):
if not cols[0] '_' cols[2] in df.columns:
df[cols[0] '_' cols[2]] = df[[col for col in df.columns if col.startswith(cols[0]) and col.endswith(cols[2])]].sum(axis=1)
output :
>>
INV01_M1_I INV01_M1_V INV01_M2_I ... INV01_V INV02_I INV02_V
0 20 10 13 ... 26 27 26
1 7 15 11 ... 31 46 17