Home > Blockchain >  Pulling Specific strings and integers from CSV file and writing it to .txt file using python
Pulling Specific strings and integers from CSV file and writing it to .txt file using python

Time:12-11

NEW Update 11:30am CST Below is my full code

Desired results would be a txt file in this format:

Logical ID: (&) 192.168.xx.xxx (if it has both)

192.168.xx.xxx

Logical ID:

192.168.xx.xxx

192.168.xx.xxx

Logical ID:

Logical ID:

192.168.xx.xxx

**Latest Code-> I want to print the LOGICAL ID if one exists, if not I want it to print the IP Address (to a new form)

The code shows that I have specified a model.csv to write to a model.txt and have to manually change it model by model. So if there is a fix for that that would be great too

import csv
import re
import sys

sys.stdout = open("C:\\Users\\ADMIN-SURV\\Desktop\\data_pull\\2.0C-H4A-DC2 .txt", 'w')
with open('C:\\Users\\ADMIN-SURV\\Desktop\\data_pull\\2.0C-H4A-DC2_filter.csv') as fid:
    inputfile = csv.reader(fid)
    for row in inputfile:
        if len(row) >= 4:
            if row[0] == 'File name':
                # skip the header row
                continue
            m = re.match(r".*(.* [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})", row[3])
            if m:
                print(m.group(1))
            else:
                print(row[3])

sys.stdout.close()

*** error parsing line: model not found H4SL-D1(2305854) Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.50.250 00:18:85:***

*** error parsing line: model not found H4SL-D1(2878617) Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.50.194 00:18:85:***

Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:1026 *** error parsing line: model not found Unsupported SOUTH LV Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.206.250 00:18:85:***

Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:3027 *** error parsing line: model not found ELEVATOR GROUND FL Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.80.203 00:18:85:***

Of course, that is a small chunk of the return.

I want to add that the CSV file is actually specifically for the model specifically, so all I actually have to do is pull the logical ID's out of the first column and add them to a list under the defined mode and if there is no Logical ID attached, then return the IP address that is given.

I searched for all instances of an item and saved the results as a CSV file. I am using python to try and grab specific information. I would love to add a photo but I am not allowed.

This is the error i am receiving

Traceback (most recent call last): File "C:\Users\ADMIN-SURV\PycharmProjects\pdf_scraping\test_file.,py", line 7, in print(column[3]) IndexError: list index out of range

This is the only code I have written:

import csv

inputfile = csv.reader(open('C:\\Users\\ADMIN-SURV\\Desktop\\data_pull\\Untitled.csv','r'))

for column in inputfile:
   print(column[3])

When I get rid of the [3] in the last line and just leave

print(column)

It prints my entire CSV file in the console. All I want is specific information from each row and I can get that info by grabbing it from a specific column.

The CSV file data looks like this:

Search Results"

"Summary"
"Saved on","12/8/2021 1:57:21 PM"
"Searched for","Avigilon (ONVIF) 1.3C-H4SL-D1"
"In document","C:\Users\ADMIN-SURV\Desktop\data_pull\IslandView.pdf"
"Number of document(s) found","1"              
"Number of instance(s) found","551"            

"File name","Title","Page","Search Instance"
"IslandView.pdf","","5","Detection: Unsupported 2058 Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2058 192.168.202.206 "
"IslandView.pdf","","9","BAR POS 01 Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.70.214 00:18:85:"
"IslandView.pdf","","9","H4SL-D1(1866954) Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:481 192.168.11.203 "
"IslandView.pdf","","9","H4SL-D1(1825930) Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:20 192.168.16.203 "
"IslandView.pdf","","9","Detection: Unsupported 2048 Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.202.200 00:18:85:"
"IslandView.pdf","","9","H4SL-D1(1866877) Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:92 192.168.15.205 "
"IslandView.pdf","","9","Detection: Unsupported 2074 Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2074 192.168.203.241 "
"IslandView.pdf","","9","Detection: Unsupported 2174 Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2174 192.168.201.232 "
"IslandView.pdf","","9","Detection: Unsupported 2161 Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2161 192.168.205.231 "

there are over 500 lines. You can see the column headers

"File name","Title","Page","Search Instance" 

All I want is the model number and logical ID info from the first column. I want to isolate then make an organized list of which logical ID is related to which Model.

Just in case, this is the Model Number and logical ID example from column 1

Avigilon (ONVIF) 1.3C-H4SL-D1 Logical ID: 875

The end goal is to create a sheet that has each model listed (this is the search results for one model) and beneath that list ALL logical ID's associated to that model.

Let me know if I can clarify or provide any further information.

Thanks!

CodePudding user response:

The top of your CSV file contains lines with less than 4 columns. To avoid the IndexError try testing the row length first:

# "inputfile" is a CSV reader instance
for row in inputfile:
    if len(row) >= 4:
        print(row[3])

Here's a variation using a regular expression to break out the model name, stopping at the IP address:

import re

with open('example_data.csv') as fid:
    inputfile = csv.reader(fid)
    for row in inputfile:
        if len(row) >= 4:
            if row[0] == 'File name':
                # skip the header row
                continue
            m = re.match(r'.*(Avigilon.* [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})', row[3])
            if m:
                print(m.group(1))
            else:
                print(f'*** error parsing line: model not found {row[3]}***')

which for the above data prints:

Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2058 192.168.202.206
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.70.214
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:481 192.168.11.203
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:20 192.168.16.203
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.202.200
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:92 192.168.15.205
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2074 192.168.203.241
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2174 192.168.201.232
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown Logical ID:2161 192.168.205.231
Avigilon (ONVIF) 1.3C-H4SL-D1 Unknown 192.168.50.246

To write the output to a text file, try something like this:

with open('logfile.txt', 'w') as fout:
    with open('example_data.csv') as fid:
        inputfile = csv.reader(fid)
        for row in inputfile:
            if len(row) >= 4:
                if row[0] == 'File name':
                    # skip the header row
                    continue
                m = re.match(r'.*(Avigilon.* [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})', row[3])
                if m:
                    # optional: delete this print line
                    print(m.group(1))
                    fout.write(f'{m.group(1)}\n')
                else:
                    print(f'*** error parsing line: model not found {row[3]}***')
  • Related