I have a dataframe with more than 5000 columns but here is an example what it looks like:
data = {'AST_0-1': [1, 2, 3],
'AST_0-45': [4, 5, 6],
'AST_0-135': [7, 8, 20],
'AST_10-1': [10, 20, 32],
'AST_10-45': [47, 56, 67],
'AST_10-135': [48, 57, 64],
'AST_110-1': [100, 85, 93],
'AST_110-45': [100, 25, 37],
'AST_110-135': [44, 55, 67]}
I want to create multiple new dataframes based on the numbers after the "-" in the columns names. For example, a dataframe with all the columns that endes with "1" [df1=(AST_0-1;AST_10-1;AST_100-1)], another that ends with "45" and another ends with "135". To do that I know I will need a loop but I am actually having trouble to select the columns to then create the dataframes.
CodePudding user response:
You can use str.extract
on the v column names to get the wanted I'd, then groupby
on axis=1
.
Here creating a dictionary of dataframes.
group = df.columns.str.extract(r'(\d )$', expand=False)
out = dict(list(df.groupby(group, axis=1)))
Output:
{'1': AST_0-1 AST_10-1 AST_110-1
0 1 10 100
1 2 20 85
2 3 32 93,
'135': AST_0-135 AST_10-135 AST_110-135
0 7 48 44
1 8 57 55
2 20 64 67,
'45': AST_0-45 AST_10-45 AST_110-45
0 4 47 100
1 5 56 25
2 6 67 37}
Accessing ID 135:
out['135']
AST_0-135 AST_10-135 AST_110-135
0 7 48 44
1 8 57 55
2 20 64 67
CodePudding user response:
Use:
df = pd.DataFrame(data)
dfs = dict(list(df.groupby(df.columns.str.rsplit('-', n=1).str[1], axis=1)))
Output:
>>> dfs
{'1': AST_0-1 AST_10-1 AST_110-1
0 1 10 100
1 2 20 85
2 3 32 93,
'135': AST_0-135 AST_10-135 AST_110-135
0 7 48 44
1 8 57 55
2 20 64 67,
'45': AST_0-45 AST_10-45 AST_110-45
0 4 47 100
1 5 56 25
2 6 67 37}
I know it's strongly discouraged but maybe you want to create dataframes like df1
, df135
, df45
. In this case, you can use:
for name, df in dfs.items():
locals()[f'df{name}'] = df
>>> df1
AST_0-1 AST_10-1 AST_110-1
0 1 10 100
1 2 20 85
2 3 32 93
>>> df135
AST_0-135 AST_10-135 AST_110-135
0 7 48 44
1 8 57 55
2 20 64 67
>>> df45
AST_0-45 AST_10-45 AST_110-45
0 4 47 100
1 5 56 25
2 6 67 37
CodePudding user response:
data = {'AST_0-1': [1, 2, 3],
'AST_0-45': [4, 5, 6],
'AST_0-135': [7, 8, 20],
'AST_10-1': [10, 20, 32],
'AST_10-45': [47, 56, 67],
'AST_10-135': [48, 57, 64],
'AST_110-1': [100, 85, 93],
'AST_110-45': [100, 25, 37],
'AST_110-135': [44, 55, 67]}
import pandas as pd
df = pd.DataFrame(data)
value_list = ["1", "45", "135"]
for value in value_list:
interest_columns = [col for col in df.columns if col.split("-")[1] == value]
df_filtered = df[interest_columns]
print(df_filtered)
Output:
AST_0-1 AST_10-1 AST_110-1
0 1 10 100
1 2 20 85
2 3 32 93
AST_0-45 AST_10-45 AST_110-45
0 4 47 100
1 5 56 25
2 6 67 37
AST_0-135 AST_10-135 AST_110-135
0 7 48 44
1 8 57 55
2 20 64 67
CodePudding user response:
I assume your problem is with the keys of the dictionary. you can get list of the keys with data.keys()
then iterate it
for example
df1 = pd.DataFrame()
df45 = pd.DataFrame()
df135 = pd.DataFrame()
for i in list(data.keys()):
the_key = i.split('-')
if the_key[1] == '1':
df1[i] = data[i]
elif the_key[1] == '45':
df45[i] = data[i]
elif the_key[1] == '135':
df135[i] = data[i]