Home > Software design >  How to get list of columns based on certain pattern from dataframe
How to get list of columns based on certain pattern from dataframe

Time:05-27

I have below columns in my dataframe, I want to get two list of columns as green highlighted below.

I tried the following codes and am able to get the desired list of sales columns which is col_sales that includes inv1.1 sales, inv1.2 sales, inv1.3 sales and inv1.4 sales

However, for col_num, I get an additional column which is inv1_num. What would be a way to exclude it? I only need inv1_1_num, inv1_2_num, inv1_3_num and inv1_4_num.

Be noted in my real dataset, I have more than 50 columns and they all follow the same pattern and these columns names cannot be changed as they will be used in the downstream process. Any suggestions are greatly apperciated.

enter image description here

import pandas as pd
df = {'inv1.1 sales':[1],
'inv1.2 sales':[2],
'inv1.3 sales':[3],
'inv1.4 sales':[1],
'inv1 sales':[2],
'inv2 sales':[3],
'inv3 sales':[3],
'inv1_1_num':[4],
'inv1_2_num':[3],
'inv1_3_num':[1],
'inv1_4_num':[3],
'inv1_num':[6],
'inv2_num':[2],
'inv3_num':[3]}

df=pd.DataFrame(df)
col_sales = [x for x in df.columns if ('inv1.' in x) & ('sales' in x)]
col_sales

col_num = [x for x in df.columns if ('inv1_' in x) & ('num' in x)]
col_num

['inv1_1_num', 'inv1_2_num', 'inv1_3_num', 'inv1_4_num', 'inv1_num']

CodePudding user response:

Adding count

col_num = [x for x in df.columns if ('inv1_' in x) & ('num' in x) &(x.count('_')==2)]
col_num
Out[298]: ['inv1_1_num', 'inv1_2_num', 'inv1_3_num', 'inv1_4_num']
  • Related