Home > Blockchain >  Dynamically Concatenate/Merge Columns of Same Name from n Data Frames Into New Data Frame
Dynamically Concatenate/Merge Columns of Same Name from n Data Frames Into New Data Frame

Time:03-24

I have 17 data frames in a list dataframes, they all have the same column names and length save for the first column which describes the source of the data. There are 7 columns which describe the date for the data which is again the same for each data frame across each row. So, there are a total of 19 columns per data frame. What I would like to do is dynamically concatenate each of the columns which have the same column name such that there is a total of 11 data frames with 24 columns 7 of which describe the date and the other 17 are the concatenated columns which shared the same column name for the list of 17 data frames.

Below is just an example of 3 data frames and the expected outcome.

df1 = pd.DataFrame(np.array([
['a', 1, 3, 9],
['a', 2, 4, 61],
['a', 3, 24, 9]]),
columns=['name', 'date','attr11', 'attr12'])

df2 = pd.DataFrame(np.array([
['b', 1, 5, 19],
['b', 2, 14, 16],
['b', 3, 4, 9]]),
columns=['name','date', 'attr11', 'attr12'])

df3 = pd.DataFrame(np.array([
['c', 1, 3, 49],
['c', 2, 4, 36],
['c', 3, 14, 9]]),
columns=['name','date' ,'attr11', 'attr12']

Result

dfattr11 
[1, 3, 5, 49],
[2, 4, 14, 36],
[3, 24, 4, 9]]),
columns=['date', 'attr11', 'attr11', 'attr11']

dfattr12...

new_dataframes = [dfattr11, dfattr12, ...]

I tried using Pandas Python: Concatenate dataframes having same columns for guidance but it seems like the solution stacked the columns opposed to parallel.

I know how I would use concat to create a new data frame but the challenge arises when trying to do it iteratively or dynamically as there are 17 data frames each with 11 columns that need to be put into their separate df. Any help would be greatly appreciated.

CodePudding user response:

IIUC, you could use pandas.concat to generate a big dataframe with all data and split it using groupby. You will get a dictionary of dataframes as output:

dfs = [df1,df2,df3]

out = {k: d.droplevel(0, axis=1) for k,d in
 pd.concat({d['name'].iloc[0]: d.set_index('date')
                                .drop(columns='name')
            for d in dfs}, axis=1)
   .groupby(level=1, axis=1)
}

Output:

{'attr11':      attr11 attr11 attr11
 date                     
 1         3      5      3
 2         4     14      4
 3        24      4     14,
 'attr12':      attr12 attr12 attr12
 date                     
 1         9     19     49
 2        61     16     36
 3         9      9      9}
  • Related