I have a dataset which called preprocessed_sample in the following format
preprocessed_sample.ftr.zstd
and I am opening it using the following code
df = pd.read_feather(filepath)
The output looks something like that
index text
0 0 i really dont come across how i actually am an...
1 1 music has become the only way i am staying san...
2 2 adults are contradicting
3 3 exo are breathing 553 miles away from me. they...
4 4 im missing people that i met when i was hospit...
and finally I would like to save this dataset in a file which called 'examples' and contains all these texts into txt format.
Update: @Tsingis I would like to have the above lines into txt files, for example the first line 'i really dont come across how i actually am an...' will be a file named 'line1.txt', in the same way all the lines will be txt files into a folder which called 'examples'.
CodePudding user response:
You can use the following code:
import pathlib
data_dir = pathlib.Path('./examples')
data_dir.mkdir(exist_ok=True)
for i, text in enumerate(df['text'], 1):
with open(f'examples/line{i}.txt', 'w') as fp:
fp.write(text)
Output:
examples/
├── line1.txt
├── line2.txt
├── line3.txt
├── line4.txt
└── line5.txt
1 directory, 5 files
line1.txt
:
i really dont come across how i actually am an...
CodePudding user response:
Another way, is to use pandas built-ins itertuples
and to_csv
:
import pandas as pd
for row in df.itertuples():
pd.Series(row.text).to_csv(f"examples/line{row.index 1}.txt",
index=False, header=False)