Home > OS >  How to append multiple text files and put each file name in a new column in pandas?
How to append multiple text files and put each file name in a new column in pandas?

Time:07-25

I have many *.txt files in a folder, each file has the same structure: same four columns but different number of rows in each file.

1.txt
  0.00781238   0.00082577   0.00781238  -0.03314453
  0.01562476   0.00064219   0.01562476  -0.03145172
  0.02343713  -0.00105686   0.02343713  -0.03775329
  0.03124951  -0.00016571   0.03124951  -0.00232971
  0.03906189   0.00240464   0.03906189   0.04244080

2.txt
  0.00781238  -0.00409547   0.00781238  -0.01004241
  0.01562476   0.00239794   0.01562476   0.03342835
  0.02343713  -0.00112586   0.02343713   0.00152711
  0.03124951   0.00374492   0.03124951   0.03581025
  0.03906189   0.00124744   0.03906189  -0.01969909
  0.04687427   0.00192426   0.04687427   0.05292329
  0.05468665   0.00266910   0.05468665   0.02569642

and more txt files from 1 to 65.

I want to have the result like this

Samplename   value1       value2       value3       value4  
1.txt        0.00781238   0.00082577   0.00781238  -0.03314453
1.txt        0.01562476   0.00064219   0.01562476  -0.03145172
1.txt        0.02343713  -0.00105686   0.02343713  -0.03775329
1.txt        0.03124951  -0.00016571   0.03124951  -0.00232971
...            
2.txt
2.txt
...
65.txt

CodePudding user response:

Find the files using os.walk, then concat the individual dataframes created from each file:

import pandas as pd
import os

txtfolder = r'/home/bera/Desktop/textfiles/'

#Find the textfiles
textfiles = []
for root, folder, files in os.walk(txtfolder):
    for file in files:
        if file.endswith('.txt'):
            fullname = os.path.join(root, file)
            textfiles.append(fullname)
textfiles.sort() #Sort the filesnames

for filenum, file in enumerate(textfiles, 1):
    if filenum==1:
        df = pd.read_csv(file, names=['value1','value2','value3','value4'], delim_whitespace=True)
        df['Samplename']=os.path.basename(file)
    else:
        tempdf = pd.read_csv(file, names=['value1','value2','value3','value4'], delim_whitespace=True)
        tempdf['Samplename']=os.path.basename(file)
        df = pd.concat([df, tempdf], ignore_index=True)
        
df = df[['Samplename','value1','value2','value3','value4']] #Reorder columns

#    Samplename    value1    value2    value3    value4
# 0       1.txt  0.007812  0.000826  0.007812 -0.033145
# 1       1.txt  0.015625  0.000642  0.015625 -0.031452
# 2       1.txt  0.023437 -0.001057  0.023437 -0.037753
# 3       1.txt  0.031250 -0.000166  0.031250 -0.002330
# 4       1.txt  0.039062  0.002405  0.039062  0.042441
# 5       2.txt  0.007812 -0.004095  0.007812 -0.010042
# 6       2.txt  0.015625  0.002398  0.015625  0.033428

CodePudding user response:

You could try the following:

df = pd.concat(
    (pd.read_csv(f"{n}.txt", delim_whitespace=True, header=None)
       .assign(Samplename=f"{n}.txt")
     for n in range(1, 66)),
    ignore_index=True
).rename(columns={n: f"value{n   1}" for n in range(4)})
df = df[["Samplename"]   list(df.columns[:-1])]
  • Reading the n.txt files with pandas.read_csv() (whitespace as delimiter, no header), adding the current filename in a column named Samplename, and conatenating the resulting dataframes into one.
  • Renameming the 4 numbered columns.
  • Moving the column Samplename to the front (cosmetic).
  • Related