How to order syslog data in csv by time-CodePudding

I have syslog data in csv file and I am trying to sort this data by time.

Actually I am able to sort my csv data with the following code.

import csv
import sys, datetime, time
import re
from openpyxl import load_workbook
import openpyxl
import time

def XLSExport(Rows, SheetName, FileName):
    from openpyxl import Workbook
    wb = Workbook()

    ws = wb.active
    ws.title = SheetName
    # ws = wb.create_sheet(SheetName)
    for x in Rows:
        ws.append(x)

    wb.save(FileName)

ExcelExport = [["timestamp","source", "message"]]

rows = []
with open("All-Messages-search-result2.csv", 'r') as file:
    csvreader = csv.reader(file)
    header = next(csvreader)
    for row in csvreader:
        rows.append(row)
#print(header)
#print(rows)

excel_list = []

for row in rows:
    #print(row[0])
    row2 = row[0].split(";")
    #print(row2[0])
    #print(row2)
    excel_list.append(f"{row2[0]};{row2[1]};{row2[2]}")
    excel_list.sort()

# print(excel_list)

for sorted_ in excel_list:
    sorted2_ = sorted_.split(";")
    ExcelExport.append([sorted2_[0],sorted2_[1],sorted2_[2]])
    #print(sorted2_)

XLSExport(ExcelExport, "Messages-result2.xlsx", "Messages-result2.xlsx")

print("'Messages-result2.xlsx' file has been created. You may find sorted syslog files. Thanks for using this program.")

See the data before and after my code is run.

However, the problem is that my code takes too much time to sort the excel if the excel has too many data such as more than 200k lines.

I think that the problem which makes the code slower is that my code appends each line as list and sort it by time. Then it creates new excel by using the list so that it takes too much time if the data has too many lines.

Is it possible to make the process faster?

CodePudding user response：

My suggestions in my comments would be expressed in code as:

def XLSExport(Rows, SheetName, FileName):
    from openpyxl import Workbook
    wb = Workbook()

    ws = wb.active
    ws.title = SheetName
    for x in Rows:
        ws.append(x)

    wb.save(FileName)

ExcelExport = [["timestamp","source", "message"]]

rows = []

with open("All-Messages-search-result2.csv", 'r') as file:
    csvreader = csv.reader(file)
    header = next(csvreader)
    for row in csvreader:
        rows.append(row[0])

rows.sort()

for sorted_ in rows:
    sorted2_ = sorted_.split(";")
    ExcelExport.append([sorted2_[0],sorted2_[1],sorted2_[2]])
    #print(sorted2_)

XLSExport(ExcelExport, "Messages-result2.xlsx", "Messages-result2.xlsx")

So eliminate this:

for row in rows:
    #print(row[0])
    row2 = row[0].split(";")
    #print(row2[0])
    #print(row2)
    excel_list.append(f"{row2[0]};{row2[1]};{row2[2]}")
    excel_list.sort()

As this:

for row in csvreader:
        rows.append(row[0])

rows.sort()

does the same thing, if I am following correctly. In the process you eliminate one for loop and do the .sort() once instead of for each iteration of the loop.