I have a csv file, in which data looks somewhat like this:
users | some ids
user1 | 1,2,3,4,5
empty cell | 6,7,8
empty cell | 123,1890,345
user2 | 555,444,333
empty cell | 11,22,33
Since ids that are in rows where users data is empty belong to the last username above in that column I would like to get dictionary for each user, looking like that:
{'user1':[1,2,3,4,5,6,7,8,123,1890,345]}
I'm using python with csv.Dictreader
reader = csv.DictReader(infile, delimiter=',', quotechar='"')
for row in list(reader):
if row['users'].startswith("user"):
for id in get_id_list(row["some ids"]):
update_dict(dict, row['users'], id)
and now I'm getting only {'user1':[1,2,3,4,5]}
, is there a good way to check whether the first cell in row is empty and make a loop with that condition defined?
CodePudding user response:
This assumes that the first row WILL have some user. It remembers user of the row and only updates it when it changes. Something like this should work:
reader = csv.DictReader(infile, delimiter=',', quotechar='"')
current_user=None
for row in list(reader):
if row['users'].startswith("user"):
current_user=row['users']
for id in get_id_list(row["some ids"]], id)
update_dict(dict,current_user,id)
CodePudding user response:
I am not familiar with the csv
library, but you can get the result by reading the file in normally and then appending to a defaultdict
whenever the user value changes -
A defaultdict
is like a dict but is more helpful when dealing with missing keys etc. You can read more about them here
from collections import defaultdict
d = defaultdict(list)
current_key = None
with open(f_path) as f:
for line in f:
if not line.startswith('users'):
if line.startswith('user'):
key, val = [_.strip() for _ in line.strip().split('|')]
current_key = key
d[current_key].extend(val.strip().split(','))
else:
if current_key is None:
continue
key, val = [_.strip() for _ in line.strip().split('|')]
d[current_key].extend(val.strip().split(','))
Output
defaultdict(list,
{'user1': ['1',
'2',
'3',
'4',
'5',
'6',
'7',
'8',
'123',
'1890',
'345'],
'user2': ['555', '444', '333', '11', '22', '33']})