I have a .txt file of this sort
12
21
23
1
23
42
12
0
In which <12,21,23> are features and <1> is a label. Again <23,42,12> are features and <0> is the label and so on. I want to create a pandas dataframe from the above text file which contains only a single column into multiple column. The format of the dataframe is {column1,column2,column3,column4}. And there are no column names in it. Can someone please help me out in this? Thanks
CodePudding user response:
import pandas as pd
df = dict()
features = list()
label = ''
filename = '.txt'
with open(filename) as fd:
i = 0
for line in fd:
if i != 3:
features.append(line.strip())
i = 1
else:
label = line.strip()
i = 0
df[label] = features
features = list()
df = pd.DataFrame(df)
df
CodePudding user response:
import pandas as pd
with open(<FILEPATH>, "r") as f:
lines = f.readlines()
formatted = [int(line[:-1]) for line in lines] # Remove \n and convert to int
labels = formatted[3::4]
features = list(zip(formatted[::4], formatted[1::4], formatted[2::4])) # You can modify this if there are more than three rows
data = {}
for i, label in enumerate(labels):
data[label] = list(features[i])
df = pd.DataFrame(data)
Comment if you have any questions or found any errors, and I will make ammendments.
CodePudding user response:
You can use numpy first, you need to ensure that the number of values is a multiple of 4
each record as column with the label as header
a = np.loadtxt('file.txt').reshape((4,-1), order='F')
df = pd.DataFrame(a[:-1], columns=a[-1])
Output:
1.0 0.0
0 12.0 23.0
1 21.0 42.0
2 23.0 12.0
each record as a new row
a = np.loadtxt('file.txt').reshape((-1,4))
df = pd.DataFrame(a)
Output:
0 1 2 3
0 12.0 21.0 23.0 1.0
1 23.0 42.0 12.0 0.0
CodePudding user response:
row = []
i = 0
data = []
with open('a.txt') as f:
for line in f:
data
i = 1
row.append(int(line.strip()))
if i%4==0 and i!=0:
print(i)
data_rows_count =1
data.append(row)
row = []
f.close()
df = pd.DataFrame(data)
results in df to be:
0 1 2 3
0 12 21 23 1
1 23 42 12 0