Home > front end >  Split data frame to many
Split data frame to many

Time:05-03

I have a data frame (total of 72 columns) that contains a few descriptors, some independent variables and some response variables as shown below,

data = {'description1':  ['first_value', 'second_value'],
        'description2': ['first_value', 'second_value'],
         'Ind_var':['first_value', 'second_value'],
         'Ind_var1':['first_value', 'second_value'],
         'Ind_var2':['first_value', 'second_value'],
         'Response1':['first_value', 'second_value'],
         'Response2':['first_value', 'second_value'],
         'Response3':['first_value', 'second_value']
        }

d0 = pd.DataFrame(data)

My goal is to split them and create three different data frames, one for every response variable. At the end, I would like to get a list where each element is a data frame, i.e.

d1 = d0[['description1', 'description2', 'Ind_var', 'Ind_var1', 'Ind_var2', 'Response1']]
d2 = d0[['description1', 'description2', 'Ind_var', 'Ind_var1', 'Ind_var2', 'Response2']]
d3 = d0[['description1', 'description2', 'Ind_var', 'Ind_var1', 'Ind_var2', 'Response3']]
df_list = [d1, d2, d3]

I did this in R, as follows,

l1 <- split.default(full_df, 
                    c(rep('Conditions', 4), 
                      rep('Dependent', 7), 
                      rep('Independent', ncol(full_df)-11)))

l2 <- lapply(l1$Dependent, function(i) data.frame(l1$Independent, i))

CodePudding user response:

If the columns you want to split are the columns 4 to 11 in df.columns

cols = df.columns.difference(df.columns[4:11]).to_list()
dfs = [df0[cols   [col]] for col in df.columns[4:11]]

cols is a list of the names of the columns you always want to keep. The following line creates a list (using list comprehension) of dataframes, selecting 1 dataframe for every column containing "Response" in its name.

Every dataframe will contain the cols columns, plus exactly 1 column containing "Response".

If df.columns[4:11] doesn't return the columns you want, swapping the slicing (the [4:11] part) will work the same way.

CodePudding user response:

If you know that the response column names consistently begin with Response, you can easily split the columns:

resps = [i for i in d0.columns if i.startswith('Response')]
commons = [i for i in d0.columns if not i.startswith('Response')]

From that it is trivial to build a list of dataframes, or a dictionnary indexed by the response:

framelist = [d0[commons   [i]] for i in resps]

or

framedict = {i: d0[commons   [i]] for i in resps}
  • Related