reading values from the text file and defining the variables-CodePudding

I have a text file that contain string, floating point numbers, integers and separated with double space

cat input.txt

nms   val   pet  dzl
sdt   2.5   3.5   1
tyu   2.8   7.5   5

I want to load the txt file and want to define every rows values to the new variable inorder to perform some task inside the loop

My trial script

import numpy as np
import pandas as pd
main_file=np.loadtxt("input.txt")
for file in main_file:
    a= should be sdt
    b= should be 2.5
    c= should be 3.5
    d= should be 1

similarly for second row i want to do the same so that

    a= should be tyu
    b= should be 2.8
    c= should be 7.5
    d= should be 5

Error: ValueError: could not convert string to float: 'sdt' How can I fix this.

CodePudding user response：

I guess in your exampe a combination of open with readlines and split with two whitespaces is doing the job.

# Using readlines()
file1 = open('input.txt', 'r')
lines = file1.readlines()
  
count = 0
# Strips the newline character
for i, line in enumerate(lines):
    if i==0:
        continue
    a,b,c,d = line.split(' '*3)
    print(f'In line {i} a is {a}, b is {b}, c is {c} and d is {d}.')

This is the output:

In line 1 a is sdt, b is  2.5, c is  3.5 and d is  1
In line 2 a is tyu, b is  2.8, c is  7.5 and d is  5

If you have more than 4 columns, you can use split() and apply this to one variable. This will be a list and you can select the items by zero-based counting.

CodePudding user response：

In [244]: cat test.csv
nms   val   pet  dzl
sdt   2.5   3.5   1
tyu   2.8   7.5   5

An easy way to load a csv is with pandas - I use the engine and sep to handle the white space separator (rather than the default comma):

In [245]: df = pd.read_csv('test.csv', sep='\s ', engine='python')
In [246]: df
Out[246]: 
   nms  val  pet  dzl
0  sdt  2.5  3.5    1
1  tyu  2.8  7.5    5

Since you tagged dataframe and pandas, I'll assume you can take it from there.

genfromtxt can read it as well, but for strings that aren't floats it inserts a nan:

In [247]: data = np.genfromtxt('test.csv')
In [248]: data
Out[248]: 
array([[nan, nan, nan, nan],
       [nan, 2.5, 3.5, 1. ],
       [nan, 2.8, 7.5, 5. ]])

np.loadtxt given the same thing raises errors because it can't convert those strings to float. That should be clear from the docs.

With a few more parameters you can get a nice structured array:

In [251]: data = np.genfromtxt('test.csv', dtype=None, names=True, encoding=None)
In [252]: data
Out[252]: 
array([('sdt', 2.5, 3.5, 1), ('tyu', 2.8, 7.5, 5)],
      dtype=[('nms', '<U3'), ('val', '<f8'), ('pet', '<f8'), ('dzl', '<i8')])

data['val'] gives all the val column. Or you can iterate on the rows with:

In [258]: for row in data:
     ...:     print(row)
     ...: 
('sdt', 2.5, 3.5, 1)
('tyu', 2.8, 7.5, 5)

A similar structured array from loadtxt:

In [262]: data = np.loadtxt('test.csv', dtype='str,f,f,i', skiprows=1,encoding=None)
In [263]: data
Out[263]: 
array([('', 2.5, 3.5, 1), ('', 2.8, 7.5, 5)],
      dtype=[('f0', '<U'), ('f1', '<f4'), ('f2', '<f4'), ('f3', '<i4')])

With base readlines you can get a list of strings for each row, and parse those as you want:

In [264]: with open('test.csv','r') as f: lines = f.readlines()
In [265]: lines
Out[265]: ['nms   val   pet  dzl\n', 'sdt   2.5   3.5   1\n', 'tyu   2.8   7.5   5\n']
In [266]: lines[1]
Out[266]: 'sdt   2.5   3.5   1\n'
In [267]: lines[1].split()
Out[267]: ['sdt', '2.5', '3.5', '1']
In [268]: a,b,c,d = lines[1].split()
In [269]: b
Out[269]: '2.5'