Home > Software design >  How do I output results to a .csv file in Python?
How do I output results to a .csv file in Python?

Time:10-23

I am new to Python and would like to write a script that that takes a .txt file as input and outputs the results to a .csv file.

The .txt files look as follows

text:eub1
region:euboea
μενανδρεσεμεεποισε

I would like to write a script that creates a new row for each instance of μ or ν in the third line above. I also want each row to contain the text and region identifier. So the result should look like this:

text,region,letter  
eub1,euboea,μ
eub1,euboea,ν
eub1,euboea,μ

I don't really know where to start with the coding, so I'd be grateful for any advice on how to do this.

CodePudding user response:

Try:

import pandas as pd

data = {}
with open("your_file.txt", "r") as f_in:
    for line in map(str.strip, f_in):
        if line == "":
            continue
        if line.startswith("text:"):
            data["text"] = line.split(":", maxsplit=1)[-1]
        elif line.startswith("region:"):
            data["region"] = line.split(":", maxsplit=1)[-1]
        else:
            data["letter"] = [ch for ch in line if ch in "μν"]

df = pd.DataFrame(data)
print(df)

df.to_csv("data.csv", index=False)

Prints:

   text  region letter
0  eub1  euboea      μ
1  eub1  euboea      ν
2  eub1  euboea      ν
3  eub1  euboea      μ

and saves data.csv:

text,region,letter
eub1,euboea,μ
eub1,euboea,ν
eub1,euboea,ν
eub1,euboea,μ

Content of your_file.txt:

text:eub1
region:euboea
μενανδρεσεμεεποισε

EDIT: To load from this file:

text:eub1
region:euboea
μενανδρεσεμεεποισε
text:eub2
region:xxx
μμμ
text:eub3
region:zzz
abc

you can try:

import pandas as pd

data = {}
with open("your_file.txt", "r") as f_in:
    for line in map(str.strip, f_in):
        if line == "":
            continue
        if line.startswith("text:"):
            data.setdefault("text", []).append(line.split(":", maxsplit=1)[-1])
        elif line.startswith("region:"):
            data.setdefault("region", []).append(
                line.split(":", maxsplit=1)[-1]
            )
        else:
            data.setdefault("letter", []).append(
                [ch for ch in line if ch in "μν"]
            )

df = pd.DataFrame(data).explode("letter")
print(df)

df.to_csv("data.csv", index=False)

Prints:

   text  region letter
0  eub1  euboea      μ
0  eub1  euboea      ν
0  eub1  euboea      ν
0  eub1  euboea      μ
1  eub2     xxx      μ
1  eub2     xxx      μ
1  eub2     xxx      μ
2  eub3     zzz    NaN

and saves data.csv:

text,region,letter
eub1,euboea,μ
eub1,euboea,ν
eub1,euboea,ν
eub1,euboea,μ
eub2,xxx,μ
eub2,xxx,μ
eub2,xxx,μ
eub3,zzz,
  • Related