I have many *.txt
files in a folder, each file has the same structure:
same four columns but different number of rows in each file.
1.txt
0.00781238 0.00082577 0.00781238 -0.03314453
0.01562476 0.00064219 0.01562476 -0.03145172
0.02343713 -0.00105686 0.02343713 -0.03775329
0.03124951 -0.00016571 0.03124951 -0.00232971
0.03906189 0.00240464 0.03906189 0.04244080
2.txt
0.00781238 -0.00409547 0.00781238 -0.01004241
0.01562476 0.00239794 0.01562476 0.03342835
0.02343713 -0.00112586 0.02343713 0.00152711
0.03124951 0.00374492 0.03124951 0.03581025
0.03906189 0.00124744 0.03906189 -0.01969909
0.04687427 0.00192426 0.04687427 0.05292329
0.05468665 0.00266910 0.05468665 0.02569642
and more txt files from 1 to 65.
I want to have the result like this
Samplename value1 value2 value3 value4
1.txt 0.00781238 0.00082577 0.00781238 -0.03314453
1.txt 0.01562476 0.00064219 0.01562476 -0.03145172
1.txt 0.02343713 -0.00105686 0.02343713 -0.03775329
1.txt 0.03124951 -0.00016571 0.03124951 -0.00232971
...
2.txt
2.txt
...
65.txt
CodePudding user response:
Find the files using os.walk, then concat the individual dataframes created from each file:
import pandas as pd
import os
txtfolder = r'/home/bera/Desktop/textfiles/'
#Find the textfiles
textfiles = []
for root, folder, files in os.walk(txtfolder):
for file in files:
if file.endswith('.txt'):
fullname = os.path.join(root, file)
textfiles.append(fullname)
textfiles.sort() #Sort the filesnames
for filenum, file in enumerate(textfiles, 1):
if filenum==1:
df = pd.read_csv(file, names=['value1','value2','value3','value4'], delim_whitespace=True)
df['Samplename']=os.path.basename(file)
else:
tempdf = pd.read_csv(file, names=['value1','value2','value3','value4'], delim_whitespace=True)
tempdf['Samplename']=os.path.basename(file)
df = pd.concat([df, tempdf], ignore_index=True)
df = df[['Samplename','value1','value2','value3','value4']] #Reorder columns
# Samplename value1 value2 value3 value4
# 0 1.txt 0.007812 0.000826 0.007812 -0.033145
# 1 1.txt 0.015625 0.000642 0.015625 -0.031452
# 2 1.txt 0.023437 -0.001057 0.023437 -0.037753
# 3 1.txt 0.031250 -0.000166 0.031250 -0.002330
# 4 1.txt 0.039062 0.002405 0.039062 0.042441
# 5 2.txt 0.007812 -0.004095 0.007812 -0.010042
# 6 2.txt 0.015625 0.002398 0.015625 0.033428
CodePudding user response:
You could try the following:
df = pd.concat(
(pd.read_csv(f"{n}.txt", delim_whitespace=True, header=None)
.assign(Samplename=f"{n}.txt")
for n in range(1, 66)),
ignore_index=True
).rename(columns={n: f"value{n 1}" for n in range(4)})
df = df[["Samplename"] list(df.columns[:-1])]
- Reading the
n.txt
files withpandas.read_csv()
(whitespace as delimiter, no header), adding the current filename in a column namedSamplename
, and conatenating the resulting dataframes into one. - Renameming the 4 numbered columns.
- Moving the column
Samplename
to the front (cosmetic).