Convert text file data into a dataframe-CodePudding

I have a dataset like this, a .txt file

للہ عمران خان کو ہماری بھی عمر لگائے ہم جیسے تو اس ملک میں اور بھی پچیس کروڑ ہیں مگر خان آپ جیسا تو یہاں دوسرا نہیں ۔۔۔ اللہ آپکی حفاظت فرمائے آمین

[Real,politics,sarcasm ,rise moral]

how can I convert into data frame into two columns, English text in column one and Urdu text in column two?

Thanks!

CodePudding user response：

multiple text files each file having data like this. Urdu, English-in-brackets

So start with a function that reads a single file of that type:

def read_single_file(filename: str) -> tuple[str, str]:
    urdu = ""
    english = ""
    with open(filename) as f:
        for line in f:
            line = line.strip()  # remove newlines etc.
            if not line:  # ignore empty lines
                continue
            if line.startswith("["):
                english = line.strip("[]")
            else:
                urdu = line
    return (urdu, english)

Then, loop over your files; I'll assume they're just *.txt:

import glob

results = [read_single_file(filename) for filename in glob.glob("*.txt")]

Now that you have a list of 2-tuples, you can just create a dataframe out of it:

import pandas as pd

df = pd.DataFrame(results, columns=["urdu", "english"])