I have a text file that contain string, floating point numbers, integers and separated with double space
cat input.txt
nms val pet dzl
sdt 2.5 3.5 1
tyu 2.8 7.5 5
I want to load the txt file and want to define every rows values to the new variable inorder to perform some task inside the loop
My trial script
import numpy as np
import pandas as pd
main_file=np.loadtxt("input.txt")
for file in main_file:
a= should be sdt
b= should be 2.5
c= should be 3.5
d= should be 1
similarly for second row i want to do the same so that
a= should be tyu
b= should be 2.8
c= should be 7.5
d= should be 5
Error: ValueError: could not convert string to float: 'sdt' How can I fix this.
CodePudding user response:
I guess in your exampe a combination of open
with readlines
and split
with two whitespaces is doing the job.
# Using readlines()
file1 = open('input.txt', 'r')
lines = file1.readlines()
count = 0
# Strips the newline character
for i, line in enumerate(lines):
if i==0:
continue
a,b,c,d = line.split(' '*3)
print(f'In line {i} a is {a}, b is {b}, c is {c} and d is {d}.')
This is the output:
In line 1 a is sdt, b is 2.5, c is 3.5 and d is 1
In line 2 a is tyu, b is 2.8, c is 7.5 and d is 5
If you have more than 4 columns, you can use split()
and apply this to one variable. This will be a list and you can select the items by zero-based counting.
CodePudding user response:
In [244]: cat test.csv
nms val pet dzl
sdt 2.5 3.5 1
tyu 2.8 7.5 5
An easy way to load a csv is with pandas - I use the engine
and sep
to handle the white space separator (rather than the default comma):
In [245]: df = pd.read_csv('test.csv', sep='\s ', engine='python')
In [246]: df
Out[246]:
nms val pet dzl
0 sdt 2.5 3.5 1
1 tyu 2.8 7.5 5
Since you tagged dataframe and pandas, I'll assume you can take it from there.
genfromtxt
can read it as well, but for strings that aren't floats it inserts a nan
:
In [247]: data = np.genfromtxt('test.csv')
In [248]: data
Out[248]:
array([[nan, nan, nan, nan],
[nan, 2.5, 3.5, 1. ],
[nan, 2.8, 7.5, 5. ]])
np.loadtxt
given the same thing raises errors because it can't convert those strings to float. That should be clear from the docs.
With a few more parameters you can get a nice structured array:
In [251]: data = np.genfromtxt('test.csv', dtype=None, names=True, encoding=None)
In [252]: data
Out[252]:
array([('sdt', 2.5, 3.5, 1), ('tyu', 2.8, 7.5, 5)],
dtype=[('nms', '<U3'), ('val', '<f8'), ('pet', '<f8'), ('dzl', '<i8')])
data['val']
gives all the val
column. Or you can iterate on the rows with:
In [258]: for row in data:
...: print(row)
...:
('sdt', 2.5, 3.5, 1)
('tyu', 2.8, 7.5, 5)
A similar structured array from loadtxt
:
In [262]: data = np.loadtxt('test.csv', dtype='str,f,f,i', skiprows=1,encoding=None)
In [263]: data
Out[263]:
array([('', 2.5, 3.5, 1), ('', 2.8, 7.5, 5)],
dtype=[('f0', '<U'), ('f1', '<f4'), ('f2', '<f4'), ('f3', '<i4')])
With base readlines
you can get a list of strings for each row, and parse those as you want:
In [264]: with open('test.csv','r') as f: lines = f.readlines()
In [265]: lines
Out[265]: ['nms val pet dzl\n', 'sdt 2.5 3.5 1\n', 'tyu 2.8 7.5 5\n']
In [266]: lines[1]
Out[266]: 'sdt 2.5 3.5 1\n'
In [267]: lines[1].split()
Out[267]: ['sdt', '2.5', '3.5', '1']
In [268]: a,b,c,d = lines[1].split()
In [269]: b
Out[269]: '2.5'