Home > Software design >  How to separate elements of a line having multiple delimiters via Python?
How to separate elements of a line having multiple delimiters via Python?

Time:11-28

date Mon Jan 4 15:59:21.129 2021

base hex  timestamps absolute

no internal events logged

// version 13.0.0

//545285.973861 previous log file: Myfile_0.asc

// Measurement UUID: 4520e127-a0b6-48d2-9e23-2588160af285

545333.620639 LoggingString := "Log,11:28 PM, Sunday, January 10, 2021,11:28:17.4,34.72,12,0.01058,11.99,0.01077,12,0.01127,11.99,0.01142,11.76,0.1053,11.99,0.01076,11.96,0.01092,2.516,0,2,OM_2_1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0"

545335.691676 LoggingString := "Log,11:28 PM, Sunday, January 10, 2021,11:28:19.5,34.61,12,0.01058,11.99,0.01072,11.99,0.01127,11.99,0.01139,11.87,0.1118,12.01,0.01046,11.99,0.01145,2.581,0,2,OM_2_1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0"

545337.796715 LoggingString := "Log,11:28 PM, Sunday, January 10, 2021,11:28:21.6,34.52,11.99,0.0106,11.99,0.01077,11.99,0.01151,11.99,0.01139,11.72,0.1081,12,0.0109,11.96,0.01107,2.543,0,2,OM_2_1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0"

545339.919752 LoggingString := "Log,11:28 PM, Sunday, January 10, 2021,11:28:23.7,34.41,12,0.01082,11.99,0.01104,11.99,0.01156,11.99,0.01164,11.62,0.1042,11.99,0.01105,11.96,0.01126,2.596,0,2,OM_2_1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0"

The above text represents my input data from a log file (below available in image format too) :

I want to perform certain operations with the data shown in the image. However, i am unable to figure out a method to split each element in the line. The data starting after 11:28:17.4 is of importance for me . I have used the numpy.genfromtxt function & usecols arg to print data between columns 6 and 22..however, i wanted to be able to split each element of the row so that i could use the elements as identifiers to begin recording the important data for me.

For e.g in line 7, there is "whitespace, comma" as separators. How do i split the data so that at the end i get the following as output :

List = ['545333.620639','Logging String:=', 'Log', .........., 2021, 11:28:17.4, 34.72 .....]

Also, when i use "Readlines()", is the data stored as one complete string in the List or as individual string elements in List?

This is a more hardcoded approach to the solution i want. This gives me a .csv file at the end with specific data extracted from a larger dataset.. However, i want a better approach to this.

  1. Instead of manually defining line number as counter to start storing data into .csv, i want to be able to define that if "// Measurement UUID:" is detected, then start storing data into .csv from next line

  2. To be able to separate each line into individual elements

  3. How to define multipe delimiters for "np.genfromtxt" function

import numpy as np

Testfile = open('C:/Documents/Myfile.asc','r')
Read_data = Testfile.readlines()
count = 0
for line in Read_data:
     count  = 1
     if count < 7:        ## counter to start saving data into .csv from 7th line
         print("Line{}: {}".format(count, line.strip()))
     else:
         mydat = np.genfromtxt("C:/Documents/Myfile.asc",skip_header=(count-1),usecols=   (4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23),delimiter=',')
         Data_frame = pd.DataFrame(mydat)
         Data_frame.to_csv("Triall_3.csv",sep=';')
         exit()

enter image description here

CodePudding user response:

I hope I've understood your question well. You can check if there's LoggingString := inside the line and if is, split the string:

import pandas as pd

out = []
with open("your_file.txt", "r") as f_in:
    for line in map(str.strip, f_in):
        if "LoggingString :=" in line:
            first_quote = line.index('"')
            last_quote = line.index('"', first_quote   1)
            out.append(
                line[:first_quote].split(maxsplit=1)
                  line[first_quote   1 : last_quote].split(","),
            )

df = pd.DataFrame(out)
print(df)

Prints:

              0                  1    2         3        4            5      6           7      8      9        10     11       12     13       14     15       16     17      18     19       20     21       22     23 24 25      26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
0  545333.620639  LoggingString :=   Log  11:28 PM   Sunday   January 10   2021  11:28:17.4  34.72     12  0.01058  11.99  0.01077     12  0.01127  11.99  0.01142  11.76  0.1053  11.99  0.01076  11.96  0.01092  2.516  0  2  OM_2_1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
1  545335.691676  LoggingString :=   Log  11:28 PM   Sunday   January 10   2021  11:28:19.5  34.61     12  0.01058  11.99  0.01072  11.99  0.01127  11.99  0.01139  11.87  0.1118  12.01  0.01046  11.99  0.01145  2.581  0  2  OM_2_1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
2  545337.796715  LoggingString :=   Log  11:28 PM   Sunday   January 10   2021  11:28:21.6  34.52  11.99   0.0106  11.99  0.01077  11.99  0.01151  11.99  0.01139  11.72  0.1081     12   0.0109  11.96  0.01107  2.543  0  2  OM_2_1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
3  545339.919752  LoggingString :=   Log  11:28 PM   Sunday   January 10   2021  11:28:23.7  34.41     12  0.01082  11.99  0.01104  11.99  0.01156  11.99  0.01164  11.62  0.1042  11.99  0.01105  11.96  0.01126  2.596  0  2  OM_2_1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  • Related