txt to array or data frame in python with same size rows-CodePudding

I am working with python and numpy! I have a txt file with integers, space seperated, and each row of the file must be a row in an array or dataframe. The problem is that not every row has the same size! I know the size that i want them to have and i want to put to the missing values the number zero! As is not comma seperated i can't find a way to do that! I was wondering if there is a way to find the length of each row of my array and add the appropriate number of zeros! Is that possible? Any other ideas? I am new at numpy library as you can see..

CodePudding user response：

As you know the number of columns in your data, this can be done using pandas read_csv. If 'test.txt' is the file you're trying to read:

df = pd.read_csv("test.txt", sep=" ", names=["col1", "col2", "col3", "col4", "col5"])
df.fillna(0, inplace=True)
print(df)

    col1    col2    col3    col4    col5
0   1       2       3       4.0     NaN
1   1       2       3       NaN     NaN
2   1       2       3       4.0     5.0

I have called the columns 'colX' but you can of course put whatever you'd like.

If you'd like to obtain the numpy array from this, you can call df.values:

array([[1., 2., 3., 4., 0.],
       [1., 2., 3., 0., 0.],
       [1., 2., 3., 4., 5.]])

CodePudding user response：

Since the question does not include much detail but assuming the text file looks like

So, in the file, consecutive blank spaces indicates missing values.

In question, if you can add the a sample text file, then, the solution can be more specific.

Based on the assumption, here is a possible solution

import numpy as np
import pandas as pd

with open(r"path\to\the\text\file\file.txt", "r") as f:
    val = np.array([[int(y) if y!="" else 0 for y in x.split(" ")] for x in f.read().split("\n")])

df = pd.DataFrame(val)