<!--00:00:13:23 - Execution started - 01042141053 - B Side
00:02:59:10 - VAR 10.2 = 'W2300900009'
00:02:50:78 - VAR 8.2 = 'W2300900007'
00:02:42:51 - VAR 6.2 = 'W2300900005'
00:03:45:18 - Execution ended'
System in Power Counter = 49035:41:56
'00:04:01:29 - Execution started - 01042141053
'00:04:40:28 - VAR 4.2 = 'W2300900023'
'00:04:36:36 - VAR 3.2 = 'W2300900022'
'00:04:32:34 - VAR 2.2 = 'W2300900021'
'00:05:50:62 - Execution ended'
This is part of workstation log and I need to sort it by S/N (W2300900009)(highlighted values) from "execute started" to "Execute ended".Would you know how to use the pandas library(Python)? Shortly how to reorder values from smallest to largest between two string values..
The result should look like this:
'00:00:13:23 - Execution started - 01042141053 - B Side'
'00:02:42:51 - VAR 6.2 = 'W2300900005'
'00:02:50:78 - VAR 8.2 = 'W2300900007'
'00:02:59:10 - VAR 10.2 = 'W2300900009'
'00:03:45:18 - Execution ended'
System in Power Counter = 49035:41:56
'00:04:01:29 - Execution started - 01042141053
'00:04:32:34 - VAR 2.2 = 'W2300900021'
'00:04:36:36 - VAR 3.2 = 'W2300900022'
'00:04:40:28 - VAR 4.2 = 'W2300900023'
'00:05:50:62 - Execution ended'-->
CodePudding user response:
You need to iterate over the lines looking for 'Execution started' and 'Execution ended'.
All lines need to be in the output list. Lines between 'Execution started' and 'Execution ended' need to be in the output list but sorted on the serial number which, in the following code, is assumed to be the last token in a line and is surrounded by single-quotes
In which case:
s = """'00:00:13:23 - Execution started - 01042141053 - B Side'
'00:02:59:10 - VAR 10.2 = 'W2300900009'
'00:02:50:78 - VAR 8.2 = 'W2300900007'
'00:02:42:51 - VAR 6.2 = 'W2300900005'
'00:03:45:18 - Execution ended'
System in Power Counter = 49035:41:56
'00:04:01:29 - Execution started - 01042141053
'00:04:40:28 - VAR 4.2 = 'W2300900023'
'00:04:36:36 - VAR 3.2 = 'W2300900022'
'00:04:32:34 - VAR 2.2 = 'W2300900021'
'00:05:50:62 - Execution ended'"""
output = []
started = False
for line in s.splitlines():
if started:
if 'Execution ended' in line:
output = sorted(slist, key=lambda x: x.split()[-1][1:-1])
output.append(line)
started = False
else:
slist.append(line)
else:
if 'Execution started' in line:
started = True
slist = []
output.append(line)
print(*output, sep='\n')
Output:
'00:00:13:23 - Execution started - 01042141053 - B Side'
'00:02:42:51 - VAR 6.2 = 'W2300900005'
'00:02:50:78 - VAR 8.2 = 'W2300900007'
'00:02:59:10 - VAR 10.2 = 'W2300900009'
'00:03:45:18 - Execution ended'
System in Power Counter = 49035:41:56
'00:04:01:29 - Execution started - 01042141053
'00:04:32:34 - VAR 2.2 = 'W2300900021'
'00:04:36:36 - VAR 3.2 = 'W2300900022'
'00:04:40:28 - VAR 4.2 = 'W2300900023'
'00:05:50:62 - Execution ended'
CodePudding user response:
One option, is to use read_csv
, extract
then sort_values
:
import pandas as pd
import numpy as np
df = (pd.read_csv("logfile.txt", sep="/", header=None)
.loc[lambda x: ~x[0].str.contains("Counter")]
.reset_index(drop=True)
)
m = df[0].str.contains("Execution ended")
df["S/N"] = np.where(m, "End", df[0].str.extract("VAR \d \.\d = 'W(\d )'", expand=False))
df["flag"] = m.cumsum().shift().fillna(0)
out = (df.groupby(df.pop("flag"), group_keys=False)
.apply(lambda x: x.sort_values(by= "S/N", na_position="first"))
.fillna({"S/N": "Start"}).rename({0:"logfile"}, axis=1))
As you can see from the row's indexes (see below), the sort is effective.
Output :
print(out)
logfile S/N
0 <!--00:00:13:23 - Execution started - 01042141053 - B Side Start
3 00:02:42:51 - VAR 6.2 = 'W2300900005' 2300900005
2 00:02:50:78 - VAR 8.2 = 'W2300900007' 2300900007
1 00:02:59:10 - VAR 10.2 = 'W2300900009' 2300900009
4 00:03:45:18 - Execution ended' End
5 '00:04:01:29 - Execution started - 01042141053 Start
8 '00:04:32:34 - VAR 2.2 = 'W2300900021' 2300900021
7 '00:04:36:36 - VAR 3.2 = 'W2300900022' 2300900022
6 '00:04:40:28 - VAR 4.2 = 'W2300900023' 2300900023
9 '00:05:50:62 - Execution ended' End