How do I extract specific data that varies in value from a file in Python-CodePudding

So I have a file that imports large amounts of data from a vision system. It saves the data in a text file and there are about 4000 lines of text with 1 line per result. I will add 1 of these lines below as an example.

11/02/1970; 11:56:44.000;ID;002914;Light Check;254;Tube Width1;38.7;Tube Width2;39.2;Tube Width3;39.9;Tube Width4;40.9;Tube Width5;41.2;Fixt Row;175.20;Fixt Col;211.23;Post Width;0.00;Blob Size;0;Left Angle;0.00;Right Angle;17.90;Dark Blob;0;Result;0;Global St;14;Tool Flag;31;Pallet No; 108;

So what I want to do is be able to do is extract for each line one of the parameters along with its value. There is a delimiter of ; between every space which is making it difficult for me.

So if I for example wanted to choose Light Check, I would get the results of Light Check for each line which in this case is 254. Can someone suggest some functions I could use that could possibly help me with this?

CodePudding user response：

to split that line into a dictionary I would do:

s = "11/02/1970; 11:56:44.000;ID;002914;Light Check;254;Tube Width1;38.7;Tube Width2;39.2;Tube Width3;39.9;Tube Width4;40.9;Tube Width5;41.2;Fixt Row;175.20;Fixt Col;211.23;Post Width;0.00;Blob Size;0;Left Angle;0.00;Right Angle;17.90;Dark Blob;0;Result;0;Global St;14;Tool Flag;31;Pallet No; 108;"

s = s.split(";")
data = {key: val for key, val in zip(s[::2], s[1::2])}

Which returns

data
{'11/02/1970': ' 11:56:44.000', 'ID': '002914', 'Light Check': '254', 'Tube Width1': '38.7', 'Tube Width2': '39.2', 'Tube Width3': '39.9', 'Tube Width4': '40.9', 'Tube Width5': '41.2', 'Fixt Row': '175.20', 'Fixt Col': '211.23', 'Post Width': '0.00', 'Blob Size': '0', 'Left Angle': '0.00', 'Right Angle': '17.90', 'Dark Blob': '0', 'Result': '0', 'Global St': '14', 'Tool Flag': '31', 'Pallet No': ' 108'}

You can then do

data['Light Check']

to get

'254'

CodePudding user response：

I suggest using regular expression (re module here), let file.txt content be

11/02/1970; 11:56:44.000;ID;002914;Light Check;254;Tube Width1;38.7;Tube Width2;39.2;Tube Width3;39.9;Tube Width4;40.9;Tube Width5;41.2;Fixt Row;175.20;Fixt Col;211.23;Post Width;0.00;Blob Size;0;Left Angle;0.00;Right Angle;17.90;Dark Blob;0;Result;0;Global St;14;Tool Flag;31;Pallet No; 108;

then

import re
with open("file.txt","r") as f:
    for line in f:
        print(re.search(r"Light Check;([0-9] )",line).group(1))

output

Explanation: I iterate over following lines (for line in f, thus there is no need to load whole file into memory), then in each line I find 1 or more ( ) digits ([0-9]) after Light Check;. Note digits are inside ( and ) that is capturing group (first and only) which I access using group(1). Disclaimer: this solution assumes Light Check; followed by 1 or more digits is present in each line of file.txt.

CodePudding user response：

Convert your data structure into a list of dict:

import csv
from datetime import datetime

with open('data.txt') as fp:
    reader = csv.reader(fp, delimiter=';')
    data = []
    for row in reader:
        row = row[:-1]
        d = {'dt': datetime.strptime(row[0]   row[1], '%d/%m/%Y %H:%M:%S.%f'),
             'id': row[3]}

        keys = row[4::2]
        vals = map(float, row[5::2])
        d.update(dict(zip(keys, vals)))
        data.append(d)

Output:

>>> data
[{'dt': datetime.datetime(1970, 2, 11, 11, 56, 44),
  'id': '002914',
  'Light Check': 254.0,
  'Tube Width1': 38.7,
  'Tube Width2': 39.2,
  'Tube Width3': 39.9,
  'Tube Width4': 40.9,
  'Tube Width5': 41.2,
  'Fixt Row': 175.2,
  'Fixt Col': 211.23,
  'Post Width': 0.0,
  'Blob Size': 0.0,
  'Left Angle': 0.0,
  'Right Angle': 17.9,
  'Dark Blob': 0.0,
  'Result': 0.0,
  'Global St': 14.0,
  'Tool Flag': 31.0,
  'Pallet No': 108.0}]

Search with Python:

out = [rec for rec in data if rec.get('Light Check') == 254]

Search with Pandas:

df = pd.DataFrame(data)
out = df[df['Light Check'] == 254]