I have 100s of text files that have a header row and then a series of numbers in each column (example below):
Textfile 1
x | y |
---|---|
0.001 | 5.45 |
0.23 | 6.83 |
0.03 | 4.86 |
Textfile 2
x | y |
---|---|
0.003 | 7.34 |
0.89 | 3.89 |
0.45 | 5.66 |
I would like to create a new pandas data frame (example below) that contains the values in column 2 of each file as a new column that has the name of the text file.
Textfile 1 | Textfile 2 |
---|---|
5.45 | 7.34 |
6.83 | 3.89 |
4.86 | 5.66 |
I was thinking along the lines of a for loop that loops through all the text files in the folder and at each iteration adds a column to the new data frame (something like the code below)
all_files=glob.glob(os.path.join(path,"*.txt"))
df_ppm = []
for file in all_files:
file_name = os.path.split(file)[1]
df_ppm.append(pd.read_csv(file_name))
CodePudding user response:
This should do what you need :)
import os
import pandas as pd
import glob
path = r'C:\Users\dane\Documents\pythonPlayground\csv_files' # use your path
all_files = glob.glob(path "/*.csv") # this is a file filter so you only read csv files
new_df = pd.DataFrame() # create a new empty dataframe
for file_path in all_files: # loop through each file
file_name = os.path.basename(file_path) # get the file name
df = pd.read_csv(file_path, index_col=None, header=0) # read a file and add it to a dataframe
new_df[file_name] = df['y'] # assuming the column is always called 'y' you can just add it to the new DF this way, otherwise, use df.iloc[1]
print(new_df)
CodePudding user response:
You can read only the columns of files using usecols
in read_csv since the column name('y') is the same in all the .txt files
all_files=glob.glob(os.path.join(path,"*.txt"))
df_ppm = {}
for file in all_files:
file_name = os.path.split(file)[1]
df_ppm[file_name[:-3]] = pd.read_csv(file_name, usecols=['y'])['y'].to_list()
df_ppm = pd.DataFrame(df_ppm)