I couldn't find the exact code on the platform which is why posting it for suggestions.
I have multiple CSV files (about 100) with the same data formats and header names.
,Mean,SD
1,96.432,13.899
2,96.432,13.899
3,96.432,13.899
4,96.432,13.899
5,96.432,13.899
I want to append all files column-wise so that I have them in one file. Also, the header of each data should be the file name so that I can follow which data belongs to which file. For example, above mean, sd--> another row of the file name.
Please guide me, as I am new to Python.
Thank you and regards, Khan.
CodePudding user response:
The question was vague about formatting, so this may vary from the desired output.
filenames = [...]
dfs = []
for f in filenames:
newdf = pd.read_csv(f)
newdf.rename(columns={'Mean': 'Mean ' f, 'SD': 'SD ' f})
dfs.append(newdf)
df = pd.concat(dfs)
CodePudding user response:
You can use pandas
to read and concatenate the files, together with glob
and a dictionary comprehension:
from glob import glob
import pandas as pd
files = glob('/tmp/*.csv') # change the location/pattern accordingly
# if you have a list of files, use: files=['file1.csv', 'file2.csv'...]
df = pd.concat({fname.rsplit('/')[-1]: pd.read_csv(fname, index_col=0)
for fname in files}, axis=1)
output:
>>> print(df)
file1.csv file2.csv
Mean SD Mean SD
1 96.432 13.899 96.432 13.899
2 96.432 13.899 96.432 13.899
3 96.432 13.899 96.432 13.899
4 96.432 13.899 96.432 13.899
5 96.432 13.899 96.432 13.899
Saving to new file:
df.to_csv('concatenated_file.csv')
output:
,file1.csv,file1.csv,file2.csv,file2.csv
,Mean,SD,Mean,SD
,,,,
1,96.432,13.899,96.432,13.899
2,96.432,13.899,96.432,13.899
3,96.432,13.899,96.432,13.899
4,96.432,13.899,96.432,13.899
5,96.432,13.899,96.432,13.899
CodePudding user response:
you can use pandas to work with
In [3]: import pandas
In [4]: import pandas as pd
In [13]: ls
abc1.csv abc.csv
In [14]: df = pd.read_csv('abc.csv')
In [15]: df1 = pd.read_csv('abc1.csv')
In [16]: df
Out[16]:
Mean SD
0 1 96.432 13.899
1 2 96.432 13.899
In [16]: df
Out[16]:
Mean SD
0 1 96.432 13.899
1 2 96.432 13.899
In [17]: df1
Out[17]:
Mean SD
0 3 96.432 13.899
1 4 96.432 13.899
2 5 96.432 13.899
In [18]: df.append(df1)
Out[18]:
Mean SD
0 1 96.432 13.899
1 2 96.432 13.899
0 3 96.432 13.899
1 4 96.432 13.899
2 5 96.432 13.899
In [19]: ds = df.append(df1)
In [20]: ds
Out[20]:
Mean SD
0 1 96.432 13.899
1 2 96.432 13.899
0 3 96.432 13.899
1 4 96.432 13.899
2 5 96.432 13.899
In [21]: ds.to_csv('file1.csv')
In [23]: ls
abc1.csv abc.csv file1.csv
To deal with multiple files
In [82]: import pandas as pd
In [83]: import os, glob
In [84]: s = glob.glob(os.path.join(os.getcwd(),'*.csv'))
In [85]: s
Out[85]:
['/home/thinkpad/Desktop/stackoverflow/abc1.csv',
'/home/thinkpad/Desktop/stackoverflow/abc.csv']
In [90]: df = pd.DataFrame(columns = ['in','Mean','SD'])
...: for i in s:
...: df1 = pd.read_csv(i)
...: print(df1.head())
...: df = df.append(df1)
In [91]: df
Out[91]:
in Mean SD
0 3 96.432 13.899
1 4 96.432 13.899
2 5 96.432 13.899
0 1 96.432 13.899
1 2 96.432 13.899