I need to subset a dataframe in to three, based on the start of the column names, which are stored in a dictionary.
here is a df:
df = pd.DataFrame(np.random.randint(0,5,size=(5, 10)), columns=('a_group1_sub','a_group1_actual','b_group1_sub','b_group1_actual','b_group2_total','b_group2_sub','b_group2_expected','class_first','class_second','area_x'))
and here is a dictionary where I want to separate the dataframe based on the following groupings: df1 = a_group1, df2 b_group2 and b_group2, df3 = class and area
groups = dict({1: ['a_group1'], 2: ['b_group1', 'b_group2'], 3: ['class', 'area']})
here is a loop i have tried
for k, v in groupings.items():
print(df.loc[:,df.columns.str.startswith([v])])
it works if i do something like this, but not in a loop
df.loc[:,df.columns.str.startswith('a_group1')])
any comments are welcome, thank you so much
CodePudding user response:
Is that what you are trying to do?
df_list = list() # The output list of dataframes
for k, v in groups.items(): # for v in groups.values() if you don't use k
# Get the columns that start with any of the elements in v
cols = [c for c in df.columns if c.startswith(tuple(v))]
# Subset df, df[cols], and append to the list of dataframes
df_list.append(df[cols])
# df_list[i] contains the dataframe i