Home > OS >  Combine multiple csv files column wise with the header as file name
Combine multiple csv files column wise with the header as file name

Time:11-30

I couldn't find the exact code on the platform which is why posting it for suggestions.

I have multiple CSV files (about 100) with the same data formats and header names.

 ,Mean,SD
1,96.432,13.899
2,96.432,13.899
3,96.432,13.899
4,96.432,13.899
5,96.432,13.899

I want to append all files column-wise so that I have them in one file. Also, the header of each data should be the file name so that I can follow which data belongs to which file. For example, above mean, sd--> another row of the file name.

Please guide me, as I am new to Python.

Thank you and regards, Khan.

CodePudding user response:

The question was vague about formatting, so this may vary from the desired output.

filenames = [...]
dfs = []
for f in filenames:
    newdf = pd.read_csv(f)
    newdf.rename(columns={'Mean': 'Mean '   f, 'SD': 'SD '   f})
    dfs.append(newdf)
df = pd.concat(dfs)

CodePudding user response:

You can use pandas to read and concatenate the files, together with glob and a dictionary comprehension:

from glob import glob
import pandas as pd

files = glob('/tmp/*.csv') # change the location/pattern accordingly
# if you have a list of files, use: files=['file1.csv', 'file2.csv'...]

df = pd.concat({fname.rsplit('/')[-1]: pd.read_csv(fname, index_col=0)
                for fname in files}, axis=1)

output:

>>> print(df)
  file1.csv         file2.csv        
       Mean      SD      Mean      SD
                                     
1    96.432  13.899    96.432  13.899
2    96.432  13.899    96.432  13.899
3    96.432  13.899    96.432  13.899
4    96.432  13.899    96.432  13.899
5    96.432  13.899    96.432  13.899

Saving to new file:

df.to_csv('concatenated_file.csv')

output:

,file1.csv,file1.csv,file2.csv,file2.csv
,Mean,SD,Mean,SD
 ,,,,
1,96.432,13.899,96.432,13.899
2,96.432,13.899,96.432,13.899
3,96.432,13.899,96.432,13.899
4,96.432,13.899,96.432,13.899
5,96.432,13.899,96.432,13.899

CodePudding user response:

you can use pandas to work with

In [3]: import pandas  

                                                                                                                           

In [4]: import pandas as pd                                                                                                                           

In [13]: ls                                                                                                                                           
abc1.csv  abc.csv

In [14]: df = pd.read_csv('abc.csv')                                                                                                                  

In [15]: df1 = pd.read_csv('abc1.csv')                                                                                                                

In [16]: df                                                                                                                                           
Out[16]: 
        Mean      SD
0  1  96.432  13.899
1  2  96.432  13.899


In [16]: df                                                                                                                                           
Out[16]: 
        Mean      SD
0  1  96.432  13.899
1  2  96.432  13.899

In [17]: df1                                                                                                                                          
Out[17]: 
        Mean      SD
0  3  96.432  13.899
1  4  96.432  13.899
2  5  96.432  13.899

In [18]: df.append(df1)                                                                                                                               
Out[18]: 
        Mean      SD
0  1  96.432  13.899
1  2  96.432  13.899
0  3  96.432  13.899
1  4  96.432  13.899
2  5  96.432  13.899

In [19]: ds = df.append(df1)                                                                                                                          

In [20]: ds                                                                                                                                           
Out[20]: 
        Mean      SD
0  1  96.432  13.899
1  2  96.432  13.899
0  3  96.432  13.899
1  4  96.432  13.899
2  5  96.432  13.899



In [21]: ds.to_csv('file1.csv')  


In [23]: ls                                                                                                                                           
abc1.csv  abc.csv  file1.csv

To deal with multiple files

In [82]: import  pandas as pd                                                                                                                         

In [83]: import os, glob                                                                                                                              

In [84]: s = glob.glob(os.path.join(os.getcwd(),'*.csv'))                                                                                             

In [85]: s                                                                                                                                            
Out[85]: 
['/home/thinkpad/Desktop/stackoverflow/abc1.csv',
 '/home/thinkpad/Desktop/stackoverflow/abc.csv']


In [90]: df = pd.DataFrame(columns = ['in','Mean','SD']) 
    ...: for i in s: 
    ...:     df1 = pd.read_csv(i) 
    ...:     print(df1.head()) 
    ...:     df = df.append(df1) 

In [91]: df                                                                                                                                           
Out[91]: 
  in    Mean      SD
0  3  96.432  13.899
1  4  96.432  13.899
2  5  96.432  13.899
0  1  96.432  13.899
1  2  96.432  13.899
  • Related