Sort values from smallest to largest among all string values Execution started and Execution ended-CodePudding

<!--00:00:13:23 - Execution started - 01042141053 - B Side
00:02:59:10 - VAR 10.2 = 'W2300900009'
00:02:50:78 - VAR 8.2 = 'W2300900007' 
00:02:42:51 - VAR 6.2 = 'W2300900005' 
00:03:45:18 - Execution ended'
System in Power Counter = 49035:41:56 
'00:04:01:29 - Execution started - 01042141053 
'00:04:40:28 - VAR 4.2 = 'W2300900023'
'00:04:36:36 - VAR 3.2 = 'W2300900022'
'00:04:32:34 - VAR 2.2 = 'W2300900021'
'00:05:50:62 - Execution ended'

This is part of workstation log and I need to sort it by S/N (W2300900009)(highlighted values) from "execute started" to "Execute ended".Would you know how to use the pandas library(Python)? Shortly how to reorder values from smallest to largest between two string values..

The result should look like this:

'00:00:13:23 - Execution started - 01042141053 - B Side'
'00:02:42:51 - VAR 6.2 = 'W2300900005'
'00:02:50:78 - VAR 8.2 = 'W2300900007'
'00:02:59:10 - VAR 10.2 = 'W2300900009'
'00:03:45:18 - Execution ended'
System in Power Counter = 49035:41:56
'00:04:01:29 - Execution started - 01042141053
'00:04:32:34 - VAR 2.2 = 'W2300900021'   
'00:04:36:36 - VAR 3.2 = 'W2300900022'
'00:04:40:28 - VAR 4.2 = 'W2300900023'
'00:05:50:62 - Execution ended'-->

CodePudding user response：

You need to iterate over the lines looking for 'Execution started' and 'Execution ended'.

All lines need to be in the output list. Lines between 'Execution started' and 'Execution ended' need to be in the output list but sorted on the serial number which, in the following code, is assumed to be the last token in a line and is surrounded by single-quotes

In which case:

s = """'00:00:13:23 - Execution started - 01042141053 - B Side'
'00:02:59:10 - VAR 10.2 = 'W2300900009'
'00:02:50:78 - VAR 8.2 = 'W2300900007' 
'00:02:42:51 - VAR 6.2 = 'W2300900005' 
'00:03:45:18 - Execution ended'
System in Power Counter = 49035:41:56 
'00:04:01:29 - Execution started - 01042141053 
'00:04:40:28 - VAR 4.2 = 'W2300900023'
'00:04:36:36 - VAR 3.2 = 'W2300900022'
'00:04:32:34 - VAR 2.2 = 'W2300900021'
'00:05:50:62 - Execution ended'"""

output = []
started = False

for line in s.splitlines():
    if started:
        if 'Execution ended' in line:
            output  = sorted(slist, key=lambda x: x.split()[-1][1:-1])
            output.append(line)
            started = False
        else:
            slist.append(line)
    else:
        if 'Execution started' in line:
            started = True
            slist = []
        output.append(line)

print(*output, sep='\n')

Output:

'00:00:13:23 - Execution started - 01042141053 - B Side'
'00:02:42:51 - VAR 6.2 = 'W2300900005' 
'00:02:50:78 - VAR 8.2 = 'W2300900007' 
'00:02:59:10 - VAR 10.2 = 'W2300900009'
'00:03:45:18 - Execution ended'
System in Power Counter = 49035:41:56 
'00:04:01:29 - Execution started - 01042141053 
'00:04:32:34 - VAR 2.2 = 'W2300900021'
'00:04:36:36 - VAR 3.2 = 'W2300900022'
'00:04:40:28 - VAR 4.2 = 'W2300900023'
'00:05:50:62 - Execution ended'

CodePudding user response：

One option, is to use read_csv, extract then sort_values :

import pandas as pd
import numpy as np

df = (pd.read_csv("logfile.txt", sep="/", header=None)
          .loc[lambda x: ~x[0].str.contains("Counter")]
          .reset_index(drop=True)
     )

m = df[0].str.contains("Execution ended")

df["S/N"] = np.where(m, "End", df[0].str.extract("VAR \d \.\d = 'W(\d )'", expand=False))
df["flag"] = m.cumsum().shift().fillna(0)

out = (df.groupby(df.pop("flag"), group_keys=False)
          .apply(lambda x: x.sort_values(by= "S/N", na_position="first"))
          .fillna({"S/N": "Start"}).rename({0:"logfile"}, axis=1))

As you can see from the row's indexes (see below), the sort is effective.

Output :

print(out)
                                                      logfile         S/N
0  <!--00:00:13:23 - Execution started - 01042141053 - B Side       Start
3                      00:02:42:51 - VAR 6.2 = 'W2300900005'   2300900005
2                      00:02:50:78 - VAR 8.2 = 'W2300900007'   2300900007
1                      00:02:59:10 - VAR 10.2 = 'W2300900009'  2300900009
4                              00:03:45:18 - Execution ended'         End
5             '00:04:01:29 - Execution started - 01042141053        Start
8                      '00:04:32:34 - VAR 2.2 = 'W2300900021'  2300900021
7                      '00:04:36:36 - VAR 3.2 = 'W2300900022'  2300900022
6                      '00:04:40:28 - VAR 4.2 = 'W2300900023'  2300900023
9                             '00:05:50:62 - Execution ended'         End