Home > Mobile >  txt to array or data frame in python with same size rows
txt to array or data frame in python with same size rows

Time:05-27

I am working with python and numpy! I have a txt file with integers, space seperated, and each row of the file must be a row in an array or dataframe. The problem is that not every row has the same size! I know the size that i want them to have and i want to put to the missing values the number zero! As is not comma seperated i can't find a way to do that! I was wondering if there is a way to find the length of each row of my array and add the appropriate number of zeros! Is that possible? Any other ideas? I am new at numpy library as you can see..

CodePudding user response:

As you know the number of columns in your data, this can be done using pandas read_csv. If 'test.txt' is the file you're trying to read:

df = pd.read_csv("test.txt", sep=" ", names=["col1", "col2", "col3", "col4", "col5"])
df.fillna(0, inplace=True)
print(df)
    col1    col2    col3    col4    col5
0   1       2       3       4.0     NaN
1   1       2       3       NaN     NaN
2   1       2       3       4.0     5.0

I have called the columns 'colX' but you can of course put whatever you'd like.

If you'd like to obtain the numpy array from this, you can call df.values:

array([[1., 2., 3., 4., 0.],
       [1., 2., 3., 0., 0.],
       [1., 2., 3., 4., 5.]])

CodePudding user response:

Since the question does not include much detail but assuming the text file looks like

3 4 12 7 9
3 4 8  7
9   9 
1 2   3

So, in the file, consecutive blank spaces indicates missing values.

In question, if you can add the a sample text file, then, the solution can be more specific.

Based on the assumption, here is a possible solution

import numpy as np
import pandas as pd

with open(r"path\to\the\text\file\file.txt", "r") as f:
    val = np.array([[int(y) if y!="" else 0 for y in x.split(" ")] for x in f.read().split("\n")])

df = pd.DataFrame(val)
  • Related