Home > Mobile >  Using glob to import txt files to an array for interpolation
Using glob to import txt files to an array for interpolation

Time:08-08

Currently I am using data (wavelength, flux) in txt format and have six txt files. The wavelengths are the same but the fluxes are different. I have imported the txt files using pd.read_cvs (as can be seen in the code) and assigned each flux a different name. These different named fluxes are placed in an array. Finally, I interpolate the fluxes with a temperature array. The codes works and because currently I only have six files writing the code this way is ok. The problem I have moving forward is that when I have 100s of txt files I need a better method.

How can I use glob to import the txt files, assign a different name to each flux (if that is necessary) and finally interpolate? Any help would be appreciated. Thank you.

import pandas as pd
import numpy as np
from scipy import interpolate

fcf = 0.0000001 # flux conversion factor
wcf = 10 #wave conversion factor
temperature = np.array([725,750,775,800,825,850])

# import files and assign column headers; blank to ignore spaces

c1p = pd.read_csv("../c/725.txt",sep=" ",header=None)
c1p.columns = ["blank","0","blank","blank","1"]
c2p = pd.read_csv("../c/750.txt",sep=" ",header=None)
c2p.columns = ["blank","0","blank","blank","1"]
c3p = pd.read_csv("../c/775.txt",sep=" ",header=None)
c3p.columns = ["blank","0","blank","blank","1"]
c4p = pd.read_csv("../c/800.txt",sep=" ",header=None)
c4p.columns = ["blank","0","blank","blank","1"]
c5p = pd.read_csv("../c/825.txt",sep=" ",header=None)
c5p.columns = ["blank","0","blank","blank","1"]
c6p = pd.read_csv("../c/850.txt",sep=" ",header=None)
c6p.columns = ["blank","0","blank","blank","1"]

wave = np.array(c1p['0']/wcf)

c1fp = np.array(c1p['1']*fcf)
c2fp = np.array(c2p['1']*fcf)
c3fp = np.array(c3p['1']*fcf)
c4fp = np.array(c4p['1']*fcf)
c5fp = np.array(c5p['1']*fcf)
c6fp = np.array(c6p['1']*fcf)

cfp = np.array([c1fp,c2fp,c3fp,c4fp,c5fp,c6fp])

flux_int = interpolate.interp1d(temperature,cfp,axis=0,kind='linear',bounds_error=False,fill_value='extrapolate')

My attempts so far...I think I have loaded the files into a list using glob as

import pandas as pd
import numpy as np
from scipy import interpolate
import glob

c_list=[]

path = "../c/*.*"

for file in glob.glob(path): 
    print(file)
    c = pd.read_csv(file,sep=" ",header=None)
    c.columns = ["blank","0","blank","blank","1"]
    c_list.append

I am still unsure how to extract just the fluxes into an array in order to interpolate. I will continue to post my attempts.

My updated code


fcf = 0.0000001
import pandas as pd
import numpy as np
from scipy import interpolate
import glob

c_list=[]

path = "../c/*.*"

for file in glob.glob(path):
    print(file)
    c = pd.read_csv(file,sep=" ",header=None)
    c.columns = ["blank","0","blank","blank","1"]
    c = c['1']*fcf
    c_list.append(c)
    
fluxes = np.array(c_list)

temperature = np.array([7250,7500,7750,8000,8250,8500])
flux_int =interpolate.interp1d(temperature,fluxes,axis=0,kind='linear',bounds_error=False,fill_value='extrapolate')

When I run this code I get the following error

raise ValueError("x and y arrays must be equal in length along "

ValueError: x and y arrays must be equal in length along interpolation axis.

I think the error in the code that needs correcting is here fluxes = np.array(c_list). This is one list of all fluxes but I need a list of fluxes from each file. How is this done?

Final attempt

import pandas as pd
import numpy as np
from scipy import interpolate
import glob

c_list=[]
path = "../c/*.*"
for file in glob.glob(path):
    print(file)
    c = pd.read_csv(file,sep=" ",header=None)
    c.columns = ["blank","0","blank","blank","1"]
    c = c['1']* 0.0000001
    c_list.append(c)
    

c1=np.array(c_list[0])
c2=np.array(c_list[1])
c3=np.array(c_list[2])
c4=np.array(c_list[3])
c5=np.array(c_list[4])
c6=np.array(c_list[5])

fluxes = np.array([c1,c2,c3,c4,c5,c6])
temperature = np.array([7250,7500,7750,8000,8250,8500])
flux_int = interpolate.interp1d(temperature,fluxes,axis=0,kind='linear',bounds_error=False,fill_value='extrapolate')

This code work but I am still not sure about

c1=np.array(c_list[0])
c2=np.array(c_list[1])
c3=np.array(c_list[2])
c4=np.array(c_list[3])
c5=np.array(c_list[4])
c6=np.array(c_list[5])

Is there a better way to write this?

CodePudding user response:

Here's 2 things that you can tdo:

  1. Instead of c = c['1']* 0.0000001

    try doing c = c['1'].to_numpy()* 0.0000001

    This will build a list of numpy Arrays rather than a list of pandas Series

  2. When constructing fluxes, you can just do fluxes = np.array(c_list)

  • Related