I have multiple (more than 100) files like this:
filename: 00.csv
residue, vwd, total
AAA,0.00, -9.45
BBB, 0.45, -1.45
CCC, 0.44, -3
DDD, 0.1, -10
filename: 01.csv
residue, vwd, total
AAA, 2, -0.56
BBB, -4, -9.32
CCC, 2.54, -10
DDD, 3, -6.4
...
I would like to create a matrix in a new csv file where the first column is "residue", and the others are based on filename (without extension). Below filename, it should be values from "total" column. It would be like this:
residue, 00, 01, ...
AAA, -9.45, -0,56, ...
BBB, -1.45, -9.32, ...
CCC, -3, -10,...
DDD, -10, -6.4, ...
. . .
Thanks in advance!
CodePudding user response:
This will work:
files = ['00.csv', '01.csv']
dfs = []
for file in files:
df = pd.read_csv(file)
df.columns = df.columns.str.strip()
df = df[['residue', 'total']].rename({'total': os.path.splitext(file)[0]}, axis=1)
dfs.append(df)
df = cols[0]
for sub_df in cols[1:]:
df = df.merge(sub_df, on='residue')
Output:
>>> df
residue 00 01
0 AAA -9.45 -0.56
1 BBB -1.45 -9.32
2 CCC -3.00 -10.00
3 DDD -10.00 -6.40
CodePudding user response:
Lets try read the files then pivot table
first get all csv files in that dir
import os
#current directory
path = "."
file_names = list(filter(lambda x:x.endswith(".csv"),os.listdir(path)))
print(file_names)
['filename: 00.csv', 'filename: 01.csv']
df = pd.DataFrame()
for file in file_names:
df = df.append(pd.concat([pd.read_csv(file),pd.DataFrame([file]*4,
columns= ["filename"])],axis=1))
# to extract digit from filename
df["filename"] =df["filename"].str.extract('(\d )')
print(df)
residue vwd total filename 0 AAA 0.00 -9.45 00 1 BBB 0.45 -1.45 00 2 CCC 0.44 -3.00 00 3 DDD 0.10 -10.00 00 0 AAA 2.00 -0.56 01 1 BBB -4.00 -9.32 01 2 CCC 2.54 -10.00 01 3 DDD 3.00 -6.40 01
Then pivot_table
will do the job
df.pivot_table(values='total', index='residue', columns='filename')
Final output:
filename 00 01 residue AAA -9.45 -0.56 BBB -1.45 -9.32 CCC -3.00 -10.00 DDD -10.00 -6.40