Home > Software design >  assign higher header level to pandas data frame in the read_csv command
assign higher header level to pandas data frame in the read_csv command

Time:09-28

I am reading multiple csv files into pandas and concatenating them into one frame. Each csv file shape is 8x256. I am using the following code:

path = r'may12_fm'
all_files = glob.glob(os.path.join(path, '*.csv'))
df_may= pd.concat((pd.read_csv(f, header=None) for f in all_files ), ignore_index=True, axis=1,)

I want to keep the name of each file as a higher level header, and also the lower header numbering to be from 0 to 255 for all concatenated frames, something like this:

--- m_0 ----------------- m_1 -------------- m_3
--- 0 1 2 3 4 .... 255 ---- 0, 1, 2, ...255------ 0, 1, 2, .....255
0
1
2
3
4 ...

I appreciate helping me in this. Thanks

CodePudding user response:

Since you know the files names used, and (if) you know the number of columns per file, maybe you could just do it after the files are loaded.

Example frame

   0  1  2  0  1  2
0  1  1  1  1  1  1
1  2  2  2  2  2  2
2  3  3  3  3  3  3
3  4  4  4  4  4  4

What this answer assumes you know

all_files = ['m_0','m_1']
cols_per_file = 3

Then you could build a multi-index

idx = pd.MultiIndex.from_arrays([np.repeat(all_files, cols_per_file), df.columns])
print(idx) 

MultiIndex([('m_0', 0),
            ('m_0', 1),
            ('m_0', 2),
            ('m_1', 0),
            ('m_1', 1),
            ('m_1', 2)],
           )

And then assign back to the frame

df.columns = idx
print(df)

Result

  m_0       m_1      
    0  1  2   0  1  2
0   1  1  1   1  1  1
1   2  2  2   2  2  2
2   3  3  3   3  3  3
3   4  4  4   4  4  4
  • Related