Home > OS >  Selecting columns based on characters in column names
Selecting columns based on characters in column names

Time:11-18

I have a pandas dataframe with columns names as ['INV01_M1_I', 'INV01_M1_V', 'INV01_M2_I', 'INV01_M2_V', 'INV02_M1_I', 'INV02_M1_V', 'INV02_M2_I', 'INV02_M2_V'....] AND SO ON. I want to sum those columns which have same 'INV_no_here' and the last character i.e. I or V. That is sum INV01_M1_I INVO1_M2_I in one column and INV02_M1_I INV02_M2_I in one column(if i can name them there it will be nice to do so). I have almost 100 columns where number changes from 01 to the end for INV. I have gone through different answers where regex, filter(like=), and other different solutions are provided. But I need to match first 5 characters and last character and then also sum those columns.

import numpy as np
import pandas pd
data = [[20, 10, 13, 16, 18, 20, 9, 6], [7, 15, 11, 16, 27, 7, 19, 10]]
df = pd.DataFrame(data, columns=['INV01_M1_I', 'INV01_M1_V','INV01_M2_I','INV01_M2_V',
'INV02_M1_I','INV02_M1_V','INV02_M2_I','INV02_M2_V'])
print(df)

CodePudding user response:

here is one way :

for cols in df.columns.str.split('_'):    
    if not cols[0]  '_'  cols[2] in df.columns:
        df[cols[0]  '_'  cols[2]] = df[[col for col in df.columns if col.startswith(cols[0]) and col.endswith(cols[2])]].sum(axis=1)

output :

>>
   INV01_M1_I  INV01_M1_V  INV01_M2_I  ...  INV01_V  INV02_I  INV02_V
0          20          10          13  ...       26       27       26
1           7          15          11  ...       31       46       17
  • Related