Home > OS >  Conditional reading next rows in csv file with python
Conditional reading next rows in csv file with python

Time:05-21

I have a csv file, in which data looks somewhat like this:

users      | some ids
user1      | 1,2,3,4,5
empty cell | 6,7,8
empty cell | 123,1890,345
user2      | 555,444,333
empty cell | 11,22,33

Since ids that are in rows where users data is empty belong to the last username above in that column I would like to get dictionary for each user, looking like that:

{'user1':[1,2,3,4,5,6,7,8,123,1890,345]}

I'm using python with csv.Dictreader

reader = csv.DictReader(infile, delimiter=',', quotechar='"')
for row in list(reader):
    if row['users'].startswith("user"):
        for id in get_id_list(row["some ids"]):
            update_dict(dict, row['users'], id)

and now I'm getting only {'user1':[1,2,3,4,5]}, is there a good way to check whether the first cell in row is empty and make a loop with that condition defined?

CodePudding user response:

This assumes that the first row WILL have some user. It remembers user of the row and only updates it when it changes. Something like this should work:

reader = csv.DictReader(infile, delimiter=',', quotechar='"')
current_user=None
for row in list(reader):
    if row['users'].startswith("user"):
        current_user=row['users']
    for id in get_id_list(row["some ids"]], id)
        update_dict(dict,current_user,id)

CodePudding user response:

I am not familiar with the csv library, but you can get the result by reading the file in normally and then appending to a defaultdict whenever the user value changes -

A defaultdict is like a dict but is more helpful when dealing with missing keys etc. You can read more about them here

from collections import defaultdict
d = defaultdict(list)
current_key = None
with open(f_path) as f:
    for line in f:
        if not line.startswith('users'):
            if line.startswith('user'):
                key, val = [_.strip() for _ in line.strip().split('|')]
                current_key = key
                d[current_key].extend(val.strip().split(','))
            else:
                if current_key is None:
                    continue
                key, val = [_.strip() for _ in line.strip().split('|')]
                d[current_key].extend(val.strip().split(','))

Output

defaultdict(list,
            {'user1': ['1',
              '2',
              '3',
              '4',
              '5',
              '6',
              '7',
              '8',
              '123',
              '1890',
              '345'],
             'user2': ['555', '444', '333', '11', '22', '33']})
  • Related