right now I'm trying to convert the juliet dataset into a pandas dataframe. I converted all the .cpp, .c and .h files into .txt files and am now trying to transfer those text files into a pandas dataframe. To do this, I'm using the read_table function in pandas, but I want every file to have its own cell. Is there any way I can get rid of separators for this function so each txt file gets one cell without being seperated.
CodePudding user response:
IIUC, you can use something like:
import glob
data = {}
for filename in glob.glob('*.txt'):
with open(filename) as fp:
data[filename] = fp.read()
df = pd.DataFrame.from_dict(data, columns=['content'], orient='index')
If you have files with the same name, use:
import glob
data = []
for filename in glob.glob('*.txt'):
with open(filename) as fp:
data.append(fp.read())
df = pd.DataFrame(data, columns=['content'])