I currently have the following dataframe:
file_path | file_name |
---|---|
/Users/user/Dropbox/SEC investigat... | _0000886982_18795_2687.txt |
/Users/user/Dropbox/SEC investigat... | _0001068875_16706_4152.txt |
... | ... |
Each file_path corresponds to a specific text file. I am trying to create a new column that uses the file_path variable to create a new variable consisting of the text included in the corresponding file_path. So far, I have the following code but am getting an error (TypeError: expected str, bytes or os.PathLike object, not Series):
pd_00['text'] = open(pd_00['file_path'], 'r')
CodePudding user response:
You can use apply
to run a function on every element in a column~
def open_file(path):
with open(path) as f:
return f.read()
df['text'] = df['file_path'].apply(open_file)
CodePudding user response:
First, create a function that reads a file and returns its content:
def read(filename):
with open(filename, 'r') as f:
return f.read()
Then you can use it with apply:
df['content'] = df.file_name.apply(read)