Home > Enterprise >  subset dataframe in to multiple using a dictionary
subset dataframe in to multiple using a dictionary

Time:09-28

I need to subset a dataframe in to three, based on the start of the column names, which are stored in a dictionary.

here is a df:

df = pd.DataFrame(np.random.randint(0,5,size=(5, 10)), columns=('a_group1_sub','a_group1_actual','b_group1_sub','b_group1_actual','b_group2_total','b_group2_sub','b_group2_expected','class_first','class_second','area_x'))

and here is a dictionary where I want to separate the dataframe based on the following groupings: df1 = a_group1, df2 b_group2 and b_group2, df3 = class and area

groups = dict({1: ['a_group1'], 2: ['b_group1', 'b_group2'], 3: ['class', 'area']})

here is a loop i have tried

for k, v in groupings.items():
    print(df.loc[:,df.columns.str.startswith([v])])

it works if i do something like this, but not in a loop

df.loc[:,df.columns.str.startswith('a_group1')])

any comments are welcome, thank you so much

CodePudding user response:

Is that what you are trying to do?

df_list = list() # The output list of dataframes
for k, v in groups.items(): # for v in groups.values() if you don't use k
    # Get the columns that start with any of the elements in v
    cols = [c for c in df.columns if c.startswith(tuple(v))]
    # Subset df, df[cols], and append to the list of dataframes
    df_list.append(df[cols])

# df_list[i] contains the dataframe i
  • Related