Home > Software design >  Clean the text data and save in csv format using Python
Clean the text data and save in csv format using Python

Time:02-22

I have a text file of about 7000 sentences. Every sentence is in a new line. The sample format of my text file's data is given below. I want to change the format and clean the data using python.

(input.txt)

I\PP.sg.n.n am\VM.3.fut.sim.dcl.fin.n.n.n going\VER.0.gen.n.n to\NC.0.0.n.n school\JQ.n.n.crd .\PU
When\PPR.pl.1.0.n.n.n.n I\PP.0.y go\VM.0.0.0.0.nfn.n.n.n outside\NC.0.0.n.n ,\PU I\NST.0.n.n saw\NN.loc.n.n something\DAB.sg.y .\PU
I\PP.0.y eat\JQ.n.n.nnm rice\NC.0.loc.n.n .\PU

I want to change the format of the above data of the text file and want the below format in CSV.

(input.csv)

Sentences Tags
I am going to school . PP VM VER NC JQ PU
When I go outside , I saw something . PPR PP VM NC PU NST NN DAB PU
I eat rice . PP JQ NC PU

I have tried some approaches but nothing is working properly to get my desired format. I am really confused. It would be a great help for me if any kind soul can help me. Thanks in advance for the help.

CodePudding user response:

Python Code:

txt = r"""
I\PP.sg.n.n am\VM.3.fut.sim.dcl.fin.n.n.n going\VER.0.gen.n.n to\NC.0.0.n.n school\JQ.n.n.crd .\PU
When\PPR.pl.1.0.n.n.n.n I\PP.0.y go\VM.0.0.0.0.nfn.n.n.n outside\NC.0.0.n.n ,\PU I\NST.0.n.n saw\NN.loc.n.n something\DAB.sg.y .\PU
I\PP.0.y eat\JQ.n.n.nnm rice\NC.0.loc.n.n .\PU
"""

for line in txt.strip().split('\n'):
    words, tags = [], []
    for wordtag in line.strip().split():
        splits = wordtag.split('\\', 1)
        words.append(splits[0])
        tags.append(splits[1].split('.')[0])
    print(f"\"{' '.join(words)}\",\"{' '.join(tags)}\"")

Output:

"I am going to school .","PP VM VER NC JQ PU"
"When I go outside , I saw something .","PPR PP VM NC PU NST NN DAB PU"
"I eat rice .","PP JQ NC PU"
  • Related