Home > Software engineering >  Python - Adding text column to DataFrame using file paths
Python - Adding text column to DataFrame using file paths

Time:08-17

I currently have the following dataframe:

file_path file_name
/Users/user/Dropbox/SEC investigat... _0000886982_18795_2687.txt
/Users/user/Dropbox/SEC investigat... _0001068875_16706_4152.txt
... ...

Each file_path corresponds to a specific text file. I am trying to create a new column that uses the file_path variable to create a new variable consisting of the text included in the corresponding file_path. So far, I have the following code but am getting an error (TypeError: expected str, bytes or os.PathLike object, not Series):

pd_00['text'] = open(pd_00['file_path'], 'r')

CodePudding user response:

You can use apply to run a function on every element in a column~

def open_file(path):
    with open(path) as f:
        return f.read()

df['text'] = df['file_path'].apply(open_file)

CodePudding user response:

First, create a function that reads a file and returns its content:

def read(filename):
  with open(filename, 'r') as f:
    return f.read()

Then you can use it with apply:

df['content'] = df.file_name.apply(read)

  • Related