Home > Enterprise >  converting the contents of txt file to columns of pandas dataframe
converting the contents of txt file to columns of pandas dataframe

Time:12-05

I have a .txt file of this sort

12
21
23
1
23
42
12
0

In which <12,21,23> are features and <1> is a label. Again <23,42,12> are features and <0> is the label and so on. I want to create a pandas dataframe from the above text file which contains only a single column into multiple column. The format of the dataframe is {column1,column2,column3,column4}. And there are no column names in it. Can someone please help me out in this? Thanks

CodePudding user response:

import pandas as pd

df = dict()
features = list()
label = ''
filename = '.txt'
with open(filename) as fd:
    i = 0
    for line in fd:
        if i != 3:
            features.append(line.strip())
            i  = 1
        else:
            label = line.strip()
            i = 0
            df[label] = features
            features = list()
df = pd.DataFrame(df)
df

CodePudding user response:

import pandas as pd

with open(<FILEPATH>, "r") as f:
    lines = f.readlines()
    formatted = [int(line[:-1]) for line in lines] # Remove \n and convert to int
    labels = formatted[3::4]
    features = list(zip(formatted[::4], formatted[1::4], formatted[2::4])) # You can modify this if there are more than three rows

data = {}
for i, label in enumerate(labels):
    data[label] = list(features[i])
df = pd.DataFrame(data)

Comment if you have any questions or found any errors, and I will make ammendments.

CodePudding user response:

You can use numpy first, you need to ensure that the number of values is a multiple of 4

each record as column with the label as header

a = np.loadtxt('file.txt').reshape((4,-1), order='F')
df = pd.DataFrame(a[:-1], columns=a[-1])

Output:

    1.0   0.0
0  12.0  23.0
1  21.0  42.0
2  23.0  12.0

each record as a new row

a = np.loadtxt('file.txt').reshape((-1,4))
df = pd.DataFrame(a)

Output:

      0     1     2    3
0  12.0  21.0  23.0  1.0
1  23.0  42.0  12.0  0.0

CodePudding user response:

row = []
i = 0
data = []
with open('a.txt') as f:
    for line in f:
        data
        i = 1
        row.append(int(line.strip()))
        if i%4==0 and i!=0:
            print(i)
            data_rows_count  =1
            data.append(row)
            row = []
f.close()
df = pd.DataFrame(data)

results in df to be:

    0   1   2   3
0   12  21  23  1
1   23  42  12  0
  • Related