I have below columns in my dataframe, I want to get two list of columns as green highlighted below.
I tried the following codes and am able to get the desired list of sales columns which is col_sales that includes inv1.1 sales, inv1.2 sales, inv1.3 sales and inv1.4 sales
However, for col_num, I get an additional column which is inv1_num. What would be a way to exclude it? I only need inv1_1_num, inv1_2_num, inv1_3_num and inv1_4_num.
Be noted in my real dataset, I have more than 50 columns and they all follow the same pattern and these columns names cannot be changed as they will be used in the downstream process. Any suggestions are greatly apperciated.
import pandas as pd
df = {'inv1.1 sales':[1],
'inv1.2 sales':[2],
'inv1.3 sales':[3],
'inv1.4 sales':[1],
'inv1 sales':[2],
'inv2 sales':[3],
'inv3 sales':[3],
'inv1_1_num':[4],
'inv1_2_num':[3],
'inv1_3_num':[1],
'inv1_4_num':[3],
'inv1_num':[6],
'inv2_num':[2],
'inv3_num':[3]}
df=pd.DataFrame(df)
col_sales = [x for x in df.columns if ('inv1.' in x) & ('sales' in x)]
col_sales
col_num = [x for x in df.columns if ('inv1_' in x) & ('num' in x)]
col_num
['inv1_1_num', 'inv1_2_num', 'inv1_3_num', 'inv1_4_num', 'inv1_num']
CodePudding user response:
Adding count
col_num = [x for x in df.columns if ('inv1_' in x) & ('num' in x) &(x.count('_')==2)]
col_num
Out[298]: ['inv1_1_num', 'inv1_2_num', 'inv1_3_num', 'inv1_4_num']