I have this type of log file:
[2022-01-01 00:01:08,111][train][info] - {"epoch":99, "data_loss":"111.013", "data_ntokens":"123.672"," data_nsentences":"2", "data_nll_loss":"2.01"}
[2022-01-01 00:01:08,111][train][info] - {"epoch":100, "data_loss":"111.01", "data_ntokens":"123.672"," data_nsentences":"2", "data_nll_loss":"2.901"}
[2022-01-01 00:01:08,111][train][info] - {"epoch":102, "data_loss":"222.09", "data_ntokens":"123.600"," data_nsentences":"2", "data_nll_loss":"2.1"}
I would like to get information inside the brackets, but the results' length is variable and I can not work with strings.
The dataframe that I try to get looks like this:
-----------------------------------------------------------------------
| epoch | data_loss | data_ntokens | data_nsentences | data_nll_notkens |
-----------------------------------------------------------------------
| 99 | 111.013 | 123.672 | 2 | 2.01 |
.....
CodePudding user response:
You can just read your log file and split the lines by the char '-', then you can build your list of dictionarys with a list comprehension and build a pandas dataframe with that list. Finally as Will Zhao says, you can use tabulate to print your dataframe in a pretty way. This is my approach:
import pandas as pd
import json
from tabulate import tabulate
with open("log_file.log", 'r') as f:
lines = f.readlines()
new_dict = [json.loads(l.split('-')[3].strip()) for l in lines]
df = pd.DataFrame(new_dict).set_index("epoch")
print(tabulate(df, headers="keys", tablefmt="psql"))
Output:
--------- ------------- ---------------- -------------------- -----------------
| epoch | data_loss | data_ntokens | data_nsentences | data_nll_loss |
|--------- ------------- ---------------- -------------------- -----------------|
| 99 | 111.013 | 123.672 | 2 | 2.01 |
| 100 | 111.01 | 123.672 | 2 | 2.901 |
| 102 | 222.09 | 123.6 | 2 | 2.1 |
--------- ------------- ---------------- -------------------- -----------------
CodePudding user response:
You could use tabulate or prettytable to display a prettified output of dataframe. You could also define your own f-string format to get similar result.
Update: Add manual way to print pretty table.
test = {"epoch":99, "data_loss":"111.013", "data_ntokens":"123.672"," data_nsentences":"2", "data_nll_loss":"2.01"}
key_string = [i.center(2 len(i)) for i in test.keys()]
keys_string = "|" "|".join(key_string) "|"
value_string = [str(v).center(2 len(k)) for k,v in test.items()]
values_string = "|" "|".join(value_string) "|"
divide_string = " " "-"*(len(keys_string)-2) " "
print(divide_string)
print(keys_string)
print(divide_string)
print(values_string)
print(divide_string)
Output:
---------------------------------------------------------------------
| epoch | data_loss | data_ntokens | data_nsentences | data_nll_loss |
---------------------------------------------------------------------
| 99 | 111.013 | 123.672 | 2 | 2.01 |
---------------------------------------------------------------------