Select columns and create new dataframe-CodePudding

I have a dataframe with more than 5000 columns but here is an example what it looks like:

data = {'AST_0-1': [1, 2, 3], 
        'AST_0-45': [4, 5, 6],
        'AST_0-135': [7, 8, 20],
        'AST_10-1': [10, 20, 32], 
        'AST_10-45': [47, 56, 67],
        'AST_10-135': [48, 57, 64],
        'AST_110-1': [100, 85, 93],
        'AST_110-45': [100, 25, 37],
        'AST_110-135': [44, 55, 67]}

I want to create multiple new dataframes based on the numbers after the "-" in the columns names. For example, a dataframe with all the columns that endes with "1" [df1=(AST_0-1;AST_10-1;AST_100-1)], another that ends with "45" and another ends with "135". To do that I know I will need a loop but I am actually having trouble to select the columns to then create the dataframes.

CodePudding user response：

You can use str.extract on the v column names to get the wanted I'd, then groupby on axis=1.

Here creating a dictionary of dataframes.

group = df.columns.str.extract(r'(\d )$', expand=False)

out = dict(list(df.groupby(group, axis=1)))

Output:

{'1':    AST_0-1  AST_10-1  AST_110-1
 0        1        10        100
 1        2        20         85
 2        3        32         93,
 '135':    AST_0-135  AST_10-135  AST_110-135
 0          7          48           44
 1          8          57           55
 2         20          64           67,
 '45':    AST_0-45  AST_10-45  AST_110-45
 0         4         47         100
 1         5         56          25
 2         6         67          37}

Accessing ID 135:

out['135']
   AST_0-135  AST_10-135  AST_110-135
0          7          48           44
1          8          57           55
2         20          64           67

CodePudding user response：

Use:

df = pd.DataFrame(data)
dfs = dict(list(df.groupby(df.columns.str.rsplit('-', n=1).str[1], axis=1)))

Output:

>>> dfs
{'1':    AST_0-1  AST_10-1  AST_110-1
 0        1        10        100
 1        2        20         85
 2        3        32         93,

 '135':    AST_0-135  AST_10-135  AST_110-135
 0          7          48           44
 1          8          57           55
 2         20          64           67,

 '45':    AST_0-45  AST_10-45  AST_110-45
 0         4         47         100
 1         5         56          25
 2         6         67          37}

I know it's strongly discouraged but maybe you want to create dataframes like df1, df135, df45. In this case, you can use:

for name, df in dfs.items():
    locals()[f'df{name}'] = df

>>> df1
   AST_0-1  AST_10-1  AST_110-1
0        1        10        100
1        2        20         85
2        3        32         93

>>> df135
   AST_0-135  AST_10-135  AST_110-135
0          7          48           44
1          8          57           55
2         20          64           67

>>> df45
   AST_0-45  AST_10-45  AST_110-45
0         4         47         100
1         5         56          25
2         6         67          37

CodePudding user response：

data = {'AST_0-1': [1, 2, 3], 
        'AST_0-45': [4, 5, 6],
        'AST_0-135': [7, 8, 20],
        'AST_10-1': [10, 20, 32], 
        'AST_10-45': [47, 56, 67],
        'AST_10-135': [48, 57, 64],
        'AST_110-1': [100, 85, 93],
        'AST_110-45': [100, 25, 37],
        'AST_110-135': [44, 55, 67]}

import pandas as pd

df = pd.DataFrame(data)

value_list = ["1", "45", "135"]

for value in value_list:
    interest_columns = [col for col in df.columns if col.split("-")[1] == value]
    df_filtered = df[interest_columns]
    print(df_filtered)

Output:

   AST_0-1  AST_10-1  AST_110-1
0        1        10        100
1        2        20         85
2        3        32         93

   AST_0-45  AST_10-45  AST_110-45
0         4         47         100
1         5         56          25
2         6         67          37

   AST_0-135  AST_10-135  AST_110-135
0          7          48           44
1          8          57           55
2         20          64           67

CodePudding user response：

I assume your problem is with the keys of the dictionary. you can get list of the keys with data.keys() then iterate it

for example

df1 = pd.DataFrame()
df45 = pd.DataFrame()
df135 = pd.DataFrame()
for i in list(data.keys()):
  the_key = i.split('-')
  if the_key[1] == '1':
    df1[i] = data[i]
  elif the_key[1] == '45':
    df45[i] = data[i]
  elif the_key[1] == '135':
    df135[i] = data[i]