Parse flat-file (positional text-file) to read the wavelength-CodePudding

I have the next txt with data:

FI R       83.0000m   34.960    1.1262      Fe 2      1.32055m   33.626    0.0522      N  
2      5754.61A   33.290    0.0241
TI R       1800.00m   33.092    0.0153      Fe 2      1.24854m   32.645    0.0054      N  
2      915.612A   31.997    0.0012
NI Ra      2.85000m   36.291   24.1132      Fe 2      7637.54A   33.077    0.0147

what I want is to obtain the third column, is the wavelength of the emergent line, but my problem is when I put the condition in the if.

Name1,ion1,wavelength1,da1,de1,name2,ion2,
wavelength2,da2,de2,name3,ion3,wavelength3,da3,de3=np.genfromtxt('Emergent_line.txt', 
skip_header=3, delimiter="", unpack=True)

if(Name1=="Fe" and ion1==2):
    print(wavelength1)
elif(name2=="Fe" and ion2==2):
    print(wavelength2)
elif(name3=="Fe" and ion3==2):
    print(wavelength3)

In the txt I want to find the wavelength for Fe 2, but I think the problem is that the wavelength have a letter in the end, I don't want to delete, because I have a large list like that. I tried another froms, but I haven't solved it.

CodePudding user response：

I think you are better off using regex

Example:

import re


text = '''FI R       83.0000m   34.960    1.1262      Fe 2      1.32055m   33.626    0.0522      N  
2      5754.61A   33.290    0.0241
TI R       1800.00m   33.092    0.0153      Fe 2      1.24854m   32.645    0.0054      N  
2      915.612A   31.997    0.0012
NI Ra      2.85000m   36.291   24.1132      Fe 2      7637.54A   33.077    0.0147'''

find_this = re.findall('(Fe 2.*?[0-9].*?)\s', text)
print(find_this)

Output:

['Fe 2      1.32055m', 'Fe 2      1.24854m', 'Fe 2      7637.54A']

[Program finished]

Or if you only want the values.

find_this = re.findall('Fe 2.*?([0-9].*?)\s', text)

Output:

['1.32055m', '1.24854m', '7637.54A']

[Program finished]

CodePudding user response：

Here is another idea, without using the re module:

someText ='FI R       83.0000m   34.960    1.1262'
someText.split()
#>> ['FI', 'R', '83.0000m', '34.960', '1.1262']
name1,ion1, lambda1, *other = someText.split()
lambda1 = float(lambda1[0:-1])
print(lambda1, other)
#>> 83.0 ['34.960', '1.1262']

You can use the .split() str method, which splits on the whitespace separating your data without the need for regex.

Regex is great for extracting more complex text formatting cases usually when the input varies. Since the input is not really varying in this case, you could also use simpler str methods instead.

CodePudding user response：

The text-file you presented seems a flat-file or fixed-with file where data (columns) are layed out

as positional text (each column starting at a predefined position)
in a fixed-width format (each column having a fixed-width)

Pandas has a method for reading fixed-width file

You could use pandas and their IO tools method read_fwf.

import io  # just for demonstration without needing a file
import pandas


text = '''FI R       83.0000m   34.960    1.1262      Fe 2      1.32055m   33.626    0.0522      N  
2      5754.61A   33.290    0.0241
TI R       1800.00m   33.092    0.0153      Fe 2      1.24854m   32.645    0.0054      N  
2      915.612A   31.997    0.0012
NI Ra      2.85000m   36.291   24.1132      Fe 2      7637.54A   33.077    0.0147'''

buffer = io.StringIO(text)  # just a helper to read from text as from file

filepath_or_buffer = buffer  # can also be the file-path directly
df =  pandas.read_fwf(filepath_or_buffer, colspecs='infer', widths=None, infer_nrows=100, header=None)
print(df)  # df represented as complete table read

wave_lengths = df.loc[(df[3] == 'Fe') & (df[4] == 2)][5]
print("== Wavelengths:")
print(wave_lengths)

buffer.close()

Prints:

    0    1                            2    3    4         5       6       7    8
0  FI    R  83.0000m   34.960    1.1262   Fe  2.0  1.32055m  33.626  0.0522    N
1   2  NaN  5754.61A   33.290    0.0241  NaN  NaN       NaN     NaN     NaN  NaN
2  TI    R  1800.00m   33.092    0.0153   Fe  2.0  1.24854m  32.645  0.0054    N
3   2  NaN  915.612A   31.997    0.0012  NaN  NaN       NaN     NaN     NaN  NaN
4  NI   Ra  2.85000m   36.291   24.1132   Fe  2.0  7637.54A  33.077  0.0147  NaN
== Wavelengths:
0    1.32055m
2    1.24854m
4    7637.54A

Note:

Python's io.StringIO was used as helper to simulate a buffer instead the file.
Panda's loc method to locate or filter the Fe 2 rows, where we printed the 5th column with wavelength

Pandas has a method for reading fixed-width file

See also