Home > other >  Elegant solution in python to extract data and put them under a basic array format
Elegant solution in python to extract data and put them under a basic array format

Time:06-12

I put the 2 first vectors of output data file that I got with wolfram Mathematica :

(* Created with the Wolfram Language for Students - Personal Use Only : www.wolfram.com *)
{{0.29344728841663786, 0.00037262711145454893, 0.7061800844719075,
  67.41431300170986, 1.3887122472912174, 0.0014182932914303275,
  500.97644711373647, 0.0002565333937360516, 105.86185844804378},
 {0.29479428399557506, 0.0007813301223490133, 0.7044243858820759,
  67.40475060370453, 1.3779372193629575, 0.00006103376259459755,
  500.30876628350757, 0.00001106337484454747, 101.39952463245301},
{...

I would like to get an elegant solution in Python to convert this output file in a basic array format (without braces, just having rows of 9 columns).

For the moment, I apply an uggly method :

# Convert chain.m to final_array.txt
os.system("cat chain.m | tr '},' '\n' | tr '{{' ' ' | tr '{' ' ' | tr '}}' ' ' | gsed 's/\*\^-/e-/g' | gsed 's/\*\^/e/g' | grep -v '(' > out.txt")
a=np.loadtxt('out.txt')
os.system('rm -f out.txt')
nline = int(len(a)/9)
b=np.reshape(a,(nline,9))
np.savetxt('final_array.txt', b)

So the final_array.txt is stored under :

0.29344728841663786 0.00037262711145454893 0.7061800844719075 67.41431300170986 1.3887122472912174 0.0014182932914303275 500.97644711373647 0.0002565333937360516 105.86185844804378
0.29479428399557506 0.0007813301223490133 0.7044243858820759 67.40475060370453 1.3779372193629575 0.00006103376259459755 500.30876628350757 0.00001106337484454747 101.39952463245301

I am convinced that a pretty and simple solution exists in Python and I would be glad to see it.

CodePudding user response:

Hard to say if the following is elegant or pretty, but I believe that it is somewhat 'pythonic'. We can parse the Wolfram output as specified using the following function that takes as input an opened file pointer to the file:

def parse_wolfram(file_pointer):
    # the first line is the header, which we ignore
    _ = file_pointer.readline()
    row_str = str()
    out_data = []
    while True:
        # Read each line till EOF stripping leading and trailing white spaces
        line = file_pointer.readline().strip()
        if not line:
            break

        # Append each line as a string to the current row
        row_str  = line
        # Find '}' to detect the end of a row
        if line.find('}') > 0:
            # Parse the row:
            # 1. Use the regular expression module to split the string
            #    where the delimiter is one or more of the character set.
            #    This produces a list of string tokens.
            # 2. [1:-1] removes the empty string tokens at the head and 
            #    tail of this list
            # 3. Use list comprehension to cast string tokens to float.
            # 4. Append list of floats for each row to output list of lists (2-D array)  
            out_data.append([float(data) for data in re.split(r'[{, }] ', row_str)[1:-1]])
            # Reset for next row
            row_str = str()

    return out_data

This function can be used as such on the file named 'chain.m' if that file is formatted as the OP suggests:

    with open('chain.m', 'r', encoding='utf-8') as fp:
        parsed_output = parse_wolfram(fp)
        
    print(parsed_output)
    [[0.29344728841663786, 0.00037262711145454893, 0.7061800844719075, 67.41431300170986, 1.3887122472912174, 0.0014182932914303275, 500.97644711373647, 0.0002565333937360516, 105.86185844804378], [0.29479428399557506, 0.0007813301223490133, 0.7044243858820759, 67.40475060370453, 1.3779372193629575, 6.103376259459755e-05, 500.30876628350757, 1.106337484454747e-05, 101.39952463245301]]

This output is a python list of lists of floats. This can be converted to a numpy array using numpy.array(parsed_output).

  • Related