So the first part of this question has been asked many times and the best answer I found was here: So perhaps I could just add a column with the ucsd1, etc. to identify each participant
Here's code that I've gotten to work for excel files:
path = r"/Users/jamesades/desktop/Watch_data_1/Re__Personalized_MH_data_call"
all_files = glob.glob(path "/*.xlsx")
li = []
for filename in all_files:
df = pd.read_excel(filename, index_col=None, header=0)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
Thanks much!
CodePudding user response:
If I understand you correctly, it's simple:
import re # <-------------- Add this line
path = r"/Users/jamesades/desktop/Watch_data_1/Re__Personalized_MH_data_call"
all_files = glob.glob(path "/*.xlsx")
li = []
for filename in all_files:
df = pd.read_excel(filename, index_col=None, header=0)
participant_number = int(re.search(r'(\d )', filename).group(1)) # <-------------- Add this line
df['participant_number'] = participant_number # <-------------- Add this line
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
That way, each dataframe loaded from an Excel file will have a column called participant_number
, and the value of that column each row in each dataframe will be the number found in the filename that the dataframe was loaded from.