Home > database >  create a dataframe using headings and text
create a dataframe using headings and text

Time:11-09

I have a file in the following format:

Heading 1

Some text here

Heading 2

Some text here

Heading 3

Some text here

I need to create a dataframe in the format

df=

Heading text
Heading 1 Some text here
Heading 2 Some text here
Heading 3 Some text here

CodePudding user response:

You can try the code below:

with open('data.txt') as fp:
    data = [line.strip() for line in fp if line.strip()]

df = pd.DataFrame(list(zip(data[::2], data[1::2])), columns=['Heading', 'text'])

Output:

>>> df
     Heading            text
0  Heading 1  Some text here
1  Heading 2  Some text here
2  Heading 3  Some text here

Content of data.txt file:

Heading 1

Some text here

Heading 2

Some text here

Heading 3

Some text here

CodePudding user response:

Considering the file has Header in odd and text in even rows:

You first read the whole data and create a numpy array. Then you can reshape it to be nx2. Last but not least you create a data frame from it.

import numpy as np
import pandas as pd

with open("data.dat", "r") as the_file:
    data = np.array([d.strip() for d in the_file])
    df = pd.DataFrame(data.reshape((-1, 2)), columns=["Heading", "text"])

CodePudding user response:

The following could be adapted if you want to understand the way of structuring your code.

df= DataFrame()
header=[]
text=[]
with open("sample.txt", "r") as f:
    for f_line in f.readlines():
        if f_line.startswith("Header"):
            header.append(f_line.rstrip())
        else:
            text.append(f_line.rstrip())
    df['header']=header
    df['text']=text
  • Related