How to parse csv file in python?-CodePudding

I need the first column of the table to be written to a variable, and the remaining columns (their number may vary) to be written to the list in order to get the desired value from the list. I'm trying to get email addresses, but the table itself is porridge, so every column needs to be checked.

with open('data.csv', 'r', encoding='utf-8-sig', newline='') as file:
    reader = csv.reader(file)
    name = list(next(reader))

    for items in list(reader):
        for item in items:
            if '@' in item:
                if not item in emails:
                    emails.append(item)
                

    with open('result.csv', 'a', encoding='utf-8-sig', newline='') as file:
        writer = csv.writer(file, delimiter=';')
        for email in emails:
            writer.writerow(
                (
                    name,
                    email
                )
            )

Input:

Наименование,Описание,Адрес,Комментарий к адресу,Почтовый индекс,Микрорайон,Район,Город,Округ,Регион,Страна,Часы работы,Часовой пояс,Телефон 1,E-mail 1,Веб-сайт 1,Instagram 1,Twitter 1,Facebook 1,ВКонтакте 1,YouTube 1,Skype 1,Широта,Долгота,2GIS URL
Магазин автозапчастей,,"Мира, 007",,655153,,,Черногорск,Черногорск городской округ,Республика Хакасия,Россия,Пн: 09:00-18:00; Вт: 09:00-18:00; Ср: 09:00-18:00; Чт: 09:00-18:00; Пт: 09:00-18:00; Сб: 09:00-18:00, 07:00,89130502009,[email protected],http://avtomagazin.2gis.biz,,,,,,,53.805192,91.334047,https://2gis.com/firm/9711414977516651
Спектр-Авто,автотехцентр,"Вяткина, 4",1 этаж,655017,,,Абакан,Абакан городской округ,Республика Хакасия,Россия,Пн: 09:00-18:00; Вт: 09:00-18:00; Ср: 09:00-18:00; Чт: 09:00-18:00; Пт: 09:00-18:00; Сб: 09:00-18:00, 07:00,89233931771, [email protected],http://spectr-avto.2gis.biz,,,,,,,53.716581,91.45005,https://2gis.com/firm/70000001034136187

The result is:

['Наименование', 'Описание', 'Адрес', 'Комментарий к адресу', 'Почтовый индекс', 'Микрорайон', 'Район', 'Город', 'Округ', 'Регион', 'Страна', 'Часы работы', 'Часовой пояс', 'Телефон 1', 'E-mail 1', 'Веб-сайт 1', 'Instagram 1', 'Twitter 1', 'Facebook 1', 'ВКонтакте 1', 'YouTube 1', 'Skype 1', 'Широта', 'Долгота', '2GIS URL'];[email protected]
['Наименование', 'Описание', 'Адрес', 'Комментарий к адресу', 'Почтовый индекс', 'Микрорайон', 'Район', 'Город', 'Округ', 'Регион', 'Страна', 'Часы работы', 'Часовой пояс', 'Телефон 1', 'E-mail 1', 'Веб-сайт 1', 'Instagram 1', 'Twitter 1', 'Facebook 1', 'ВКонтакте 1', 'YouTube 1', 'Skype 1', 'Широта', 'Долгота', '2GIS URL']; [email protected]
['Наименование', 'Описание', 'Адрес', 'Комментарий к адресу', 'Почтовый индекс', 'Микрорайон', 'Район', 'Город', 'Округ', 'Регион', 'Страна', 'Часы работы', 'Часовой пояс', 'Телефон 1', 'E-mail 1', 'Веб-сайт 1', 'Instagram 1', 'Twitter 1', 'Facebook 1', 'ВКонтакте 1', 'YouTube 1', 'Skype 1', 'Широта', 'Долгота', '2GIS URL'];[email protected]

CodePudding user response：

If I understand the question correctly, what you really want to output is a two-column CSV, with names in the first column, which I assume come from the original CSV's first column, and e-mail in the second column.

If my assumptions are correct, this should work for you:

import csv

with open('data.csv', 'r', encoding='utf-8-sig', newline='') as file:
    reader = csv.reader(file)
    header = list(next(reader))

    emails = []
    for items in reader:
        name = items[0]
        for item in items:
            if '@' in item:
                if not (name, item) in emails:
                    emails.append((name, item))
                

    with open('result.csv', 'a', encoding='utf-8-sig', newline='') as file:
        writer = csv.writer(file, delimiter=';')
        for email in emails:
            writer.writerow(email)

Output:

Магазин автозапчастей;[email protected]
Спектр-Авто; [email protected]

Things I have changed in your code:

The input CSV header is now read into header - did you want to do anything with that?
The name is now set from items[0] for each row in the input CSV.
The emails list is now a list of (name, email) pairs.
Optimization detail: you don't need to turn reader into a list to iterate over it. Just say for items in reader:, it'll be more efficient since it will process each row as it reads it instead of storing them all into a list.

CodePudding user response：

import petl

table = petl.fromcsv('data.csv', encoding='utf-8-sig')
table2 = petl.addfield(table, 'email_address', lambda r: [r[r1] for r1 in petl.header(table) if '@' in r[r1]])
table3 = petl.cut(table2, 'Наименование', 'email_address')
petl.tocsv(table3, 'result.csv', encoding='utf-8-sig', delimiter=';', write_header=True)

Load the CSV into a table
Create a new field(column) that is an aggregate of any field containing an email address
Reduce(cut) the table to only contain the 2 important fields: 'Наименование', 'email_address'
Output the results to a CSV

Output:

Наименование;email_address
Магазин автозапчастей;['[email protected]']
Спектр-Авто;[' [email protected]']

Be sure to install petl:

pip install petl