I've currently written a code to run through all the hdf5 files in a directory, print out the tables from the files, plot a figure for each table, and then spit out the area under the curve for each. This is the code.
import os
directory = '/Users/xx'
for filename in os.listdir(directory):
if filename.endswith(".hdf5"):
xdata = file.get('data')
xdata= np.array(xdata)
xdata_df = pd.DataFrame(xdata)
table1 = pd.DataFrame(xdata_df).reset_index()
print(table1)
x = table1["index"]
y = table1[0]
plt.figure(figsize=(10, 10))
plt.rcParams.update({'font.size': 20})
figure1 = plt.plot(x, y)
# Compute the area using the composite trapezoidal rule.
area = trapz(y, dx=100000)
print("trapz area =", area)
# Compute the area using the composite Simpson's rule.
area = simps(y, dx=100000)
print("simpsons area =", area)
continue
else:
continue
However, my code seems to running through the directory (15 files), but spitting out the exact same table 15 times, figure and area under the curve. Does anyone know why this may be happening?
CodePudding user response:
Short answer, to get the Y values, you should use y = table1[1]
,and not y = table1[0]
. You read the values as x = table1["index"]
- you should use x = table1[0]
. Also, do you realize you aren't using x
when you calltrpz()
and simps()
. You are creating 2 dataframes: xdata_df
and table1
and only use table1
- Why? If you just need the X/Y data, you can read the values directly from the dataset (dataframes are not required).
Note: code above is missing h5py.File()
to open the H5 file.
Finally, you can simplify and cleanup your code as follows:
for filename in glob.iglob(f'{directory}/*.hdf5'):
with h5py.File(filename,'r') as file:
xdata = file['data'][()]
x = xdata[:,0] # or x = file['data'][:,0]
y = xdata[:,1] # or y = file['data'][:,1]
# Compute the area using the composite trapezoidal rule.
area = trapz(y, dx=100000)
print("trapz area =", area)
# Compute the area using the composite Simpson's rule.
area = simps(y, dx=100000)
print("simpsons area =", area)
Or, if you prefer to use dataframes:
for filename in glob.iglob(f'{directory}/*.hdf5'):
with h5py.File(filename,'r') as file:
xdata = file['data'][()]
xdata_df = pd.DataFrame(xdata)
table1 = pd.DataFrame(xdata_df).reset_index()
x = table1[0]
y = table1[1]
# Compute the area using the composite trapezoidal rule.
area = trapz(y, dx=100000)
print("trapz area =", area)
# Compute the area using the composite Simpson's rule.
area = simps(y, dx=100000)
print("simpsons area =", area)