Using pandas read_table without seperators-CodePudding

right now I'm trying to convert the juliet dataset into a pandas dataframe. I converted all the .cpp, .c and .h files into .txt files and am now trying to transfer those text files into a pandas dataframe. To do this, I'm using the read_table function in pandas, but I want every file to have its own cell. Is there any way I can get rid of separators for this function so each txt file gets one cell without being seperated.

CodePudding user response：

IIUC, you can use something like:

import glob

data = {}
for filename in glob.glob('*.txt'):
    with open(filename) as fp:
        data[filename] = fp.read()

df = pd.DataFrame.from_dict(data, columns=['content'], orient='index')

If you have files with the same name, use:

import glob

data = []
for filename in glob.glob('*.txt'):
    with open(filename) as fp:
        data.append(fp.read())

df = pd.DataFrame(data, columns=['content'])