(code correction) Read multiple files in different directions-CodePudding

I have a number of files like this:

Inside each folder is 3 more like this:

Now inside each of these folders is a .txt file that looks like this:

For each of the .txt files I need to get the value from the 6th column in the file which I have circled in red and I am only interested in the lines that contain cope1, cope2, cope3, cope4 and cope5 at the start (highlighted in blue). Everything else can be ignored.

I can't use PANDAS so I got this code using NUMPY but when I run it the output it only reads one file instead of 10.

this is the code I'm using:

import os
import numpy as np

li = []

# Traverses thru the root folder 'roi_data' tree and opens .txt files.
for root, dirs, files in os.walk('roi_data'):
    for name in files:
        # Opens .txt file as numpy array and uses the second and the sixth columns.
        file_path = os.path.join(root, name)
        arr = np.loadtxt(file_path, delimiter=' ', usecols=[1, 5], dtype=str)
        
        # Filters out rows except those which contains 'cope'.
        # Adds ROI columns based on the file dir.
        arr = arr[np.char.startswith(arr[:, 0], 'stats/cope')]
        roi = np.full(fill_value=root.split("/")[-1], shape=(5, 1))
        arr = np.concatenate((roi, arr), axis=1)
        
        # Adds files path to distinguish for which file the calculation is done.
        # Appends the array to the list.
        file_path = np.full(fill_value=file_path, shape=(5, 1))
        arr = np.concatenate((arr, file_path), axis=1)
        li.append(arr)

# Concatenates all extracted arrays.
# Calculates the requested metrics and builds the result_di
combined_arr = np.array(li).reshape((-1, 4))
groups = (np.char.array(combined_arr[:, 0])
            '_'   np.char.array(combined_arr[:, 1])).reshape((-1, 1))
combined_arr = np.concatenate((groups, combined_arr), axis=1)
result_di = dict()
for group in set(combined_arr[:, 0]):
    group_slice = combined_arr[combined_arr[:, 0] == group]
    values = (group_slice[:, 3].astype(float).mean(), 
              group_slice[:, 3].astype(np.float64).std(ddof=1),
              group_slice[:, 3].astype(np.float64).shape[0])
    result_di[group] = values

result_di =  dict(sorted(result_di.items()))
print(result_di)

this is part of the output:

{'roi_data\\01\\ffa_stats/cope1': (0.2577, nan, 1), 'roi_data\\01\\ffa_stats/cope2': (0.2311, nan, 1), 'roi_data\\01\\ffa_stats/cope3': (0.6393, nan, 1),...

and this is how the output should be:

{'ffa_stats/cope1': (0.76427, 0.36723498396046694, 10),
 'ffa_stats/cope2': (0.7036800000000001, 0.4011380360923157, 10),
 'ffa_stats/cope3': (1.0842100000000001, 0.39293685511938314, 10),
 'ffa_stats/cope4': (0.511365, 0.394306610851392, 10),
 'ffa_stats/cope5': (0.92214, 0.4897486570794361, 10),

I would really appreciate some help understanding what is wrong with the code. In case it is necessary to test my code I share below a link to the files: https://wetransfer.com/downloads/9120b0776ba711364579fc3b4c1374c520230106104247/654a21

CodePudding user response：

In roi = np.full(fill_value=root.split("/")[-1], shape=(5, 1)) the split is done with "/" instead of "\", change it to roi = np.full(fill_value=root.split("\\")[-1], shape=(5, 1)) and it worked for me.