I have a data frame (total of 72 columns) that contains a few descriptors, some independent variables and some response variables as shown below,
data = {'description1': ['first_value', 'second_value'],
'description2': ['first_value', 'second_value'],
'Ind_var':['first_value', 'second_value'],
'Ind_var1':['first_value', 'second_value'],
'Ind_var2':['first_value', 'second_value'],
'Response1':['first_value', 'second_value'],
'Response2':['first_value', 'second_value'],
'Response3':['first_value', 'second_value']
}
d0 = pd.DataFrame(data)
My goal is to split them and create three different data frames, one for every response variable. At the end, I would like to get a list where each element is a data frame, i.e.
d1 = d0[['description1', 'description2', 'Ind_var', 'Ind_var1', 'Ind_var2', 'Response1']]
d2 = d0[['description1', 'description2', 'Ind_var', 'Ind_var1', 'Ind_var2', 'Response2']]
d3 = d0[['description1', 'description2', 'Ind_var', 'Ind_var1', 'Ind_var2', 'Response3']]
df_list = [d1, d2, d3]
I did this in R
, as follows,
l1 <- split.default(full_df,
c(rep('Conditions', 4),
rep('Dependent', 7),
rep('Independent', ncol(full_df)-11)))
l2 <- lapply(l1$Dependent, function(i) data.frame(l1$Independent, i))
CodePudding user response:
If the columns you want to split are the columns 4 to 11 in df.columns
cols = df.columns.difference(df.columns[4:11]).to_list()
dfs = [df0[cols [col]] for col in df.columns[4:11]]
cols
is a list of the names of the columns you always want to keep. The following line creates a list (using list comprehension) of dataframes, selecting 1 dataframe for every column containing "Response" in its name.
Every dataframe will contain the cols
columns, plus exactly 1 column containing "Response"
.
If df.columns[4:11]
doesn't return the columns you want, swapping the slicing (the [4:11]
part) will work the same way.
CodePudding user response:
If you know that the response
column names consistently begin with Response
, you can easily split the columns:
resps = [i for i in d0.columns if i.startswith('Response')]
commons = [i for i in d0.columns if not i.startswith('Response')]
From that it is trivial to build a list of dataframes, or a dictionnary indexed by the response:
framelist = [d0[commons [i]] for i in resps]
or
framedict = {i: d0[commons [i]] for i in resps}