I am working with python and numpy! I have a txt file with integers, space seperated, and each row of the file must be a row in an array or dataframe. The problem is that not every row has the same size! I know the size that i want them to have and i want to put to the missing values the number zero! As is not comma seperated i can't find a way to do that! I was wondering if there is a way to find the length of each row of my array and add the appropriate number of zeros! Is that possible? Any other ideas? I am new at numpy library as you can see..
CodePudding user response:
As you know the number of columns in your data, this can be done using pandas read_csv. If 'test.txt' is the file you're trying to read:
df = pd.read_csv("test.txt", sep=" ", names=["col1", "col2", "col3", "col4", "col5"])
df.fillna(0, inplace=True)
print(df)
col1 col2 col3 col4 col5
0 1 2 3 4.0 NaN
1 1 2 3 NaN NaN
2 1 2 3 4.0 5.0
I have called the columns 'colX' but you can of course put whatever you'd like.
If you'd like to obtain the numpy array from this, you can call df.values
:
array([[1., 2., 3., 4., 0.],
[1., 2., 3., 0., 0.],
[1., 2., 3., 4., 5.]])
CodePudding user response:
Since the question does not include much detail but assuming the text file looks like
3 4 12 7 9
3 4 8 7
9 9
1 2 3
So, in the file, consecutive blank spaces indicates missing values.
In question, if you can add the a sample text file, then, the solution can be more specific.
Based on the assumption, here is a possible solution
import numpy as np
import pandas as pd
with open(r"path\to\the\text\file\file.txt", "r") as f:
val = np.array([[int(y) if y!="" else 0 for y in x.split(" ")] for x in f.read().split("\n")])
df = pd.DataFrame(val)