How to extract a specific value from multiple csv of a directory, and append them in a dataframe?-CodePudding

I have a directory with hundreds of csv files that represent the pixels of a thermal camera (288x383), and I want to get the center value of each file (e.g. 144 x 191), and with each one of the those values collected, add them in a dataframe that presents the list with the names of each file.

Follow my code, where I created the dataframe with the lists of several csv files:

import os
import glob
import numpy as np
import pandas as pd
os.chdir("/Programming/Proj1/Code/Image_Data")

!ls

Out:
2021-09-13_13-42-16.csv
2021-09-13_13-42-22.csv
2021-09-13_13-42-29.csv
2021-09-13_13-42-35.csv
2021-09-13_13-42-47.csv
2021-09-13_13-42-53.csv
...

file_extension = '.csv'
all_filenames = [i for i in glob.glob(f"*{file_extension}")]
files = glob.glob('*.csv')

all_df = pd.DataFrame(all_filenames, columns = ['Full_name '])

all_df.head()
    **Full_name**
0   2021-09-13_13-42-16.csv
1   2021-09-13_13-42-22.csv
2   2021-09-13_13-42-29.csv
3   2021-09-13_13-42-35.csv
4   2021-09-13_13-42-47.csv
5   2021-09-13_13-42-53.csv
6   2021-09-13_13-43-00.csv

CodePudding user response：

You can loop through your files one by one, reading them in as a dataframe and taking the center value that you want. Then save this value along with the file name. This list of results can then be read in to a new dataframe ready for you to use.

result = []
for file in files: 
    # read in the file, you may need to specify some extra parameters
    # check the pandas docs for read_csv
    df = pd.read_csv(file)

    # now select the value you want
    # this will vary depending on what your indexes look like (if any)
    # and also your column names
    value = df.loc[row, col]

    # append to the list
    result.append((file, value))

# you should now have a list in the format:
# [('2021-09-13_13-42-16.csv', 100), ('2021-09-13_13-42-22.csv', 255), ...

# load the list of tuples as a dataframe for further processing or analysis...
result_df = pd.DataFrame(result)