I´m kind of new to Python, and I am trying to convert a list comprehension (Hands-on Data Analysis with Pandas by S.Molin) into a "normal" for loop, just for the mere purpose of practising.
Initially, the data comes from a CSV file and is loaded using Numpy. The result is each CSV row as a single array (void type) as follows:
array([('2018-10-13 11:10:23.560', '262km NW of Ozernovskiy, Russia', 'mww', 6.7, 'green', 1), ('2018-10-13 04:34:15.580', '25km E of Bitung, Indonesia', 'mww', 5.2, 'green', 0), ('2018-10-13 00:13:46.220', '42km WNW of Sola, Vanuatu', 'mww', 5.7, 'green', 0), ('2018-10-12 21:09:49.240', '13km E of Nueva Concepcion, Guatemala', 'mww', 5.7, 'green', 0), ('2018-10-12 02:52:03.620', '128km SE of Kimbe, Papua New Guinea', 'mww', 5.6, 'green', 1)], dtype=[('time', '<U23'), ('place', '<U37'), ('magType', '<U3'), ('mag', '<f8'), ('alert', '<U5'), ('tsunami', '<i4')])
What I am trying is to alter it so that I get each column as an array of values, whose keys are the name of the columns:
{'time': array(['2018-10-13 11:10:23.560', '2018-10-13 04:34:15.580','2018-10-13 00:13:46.220', '2018-10-12 21:09:49.240', '2018-10-12 02:52:03.620'], dtype='<U23'), 'place': array(['262km NW of Ozernovskiy, Russia', '25km E of Bitung, Indonesia', '42km WNW of Sola, Vanuatu','13km E of Nueva Concepcion, Guatemala','128km SE of Kimbe, Papua New Guinea'], dtype='<U37'), 'magType': array(['mww', 'mww', 'mww', 'mww', 'mww'], dtype='<U3'), 'mag': array([6.7, 5.2, 5.7, 5.7, 5.6]), 'alert': array(['green', 'green', 'green', 'green', 'green'], dtype='<U5'), 'tsunami': array([1, 0, 0, 0, 1])}
The List comprehension used for this purpose is:
array_dict = {col: np.array([row[i] for row in data]) for i, col in enumerate(data.dtype.names)}
The solution I got so far is:
d ={}
for i,col in enumerate(data.dtype.names):
for row in data:
d[col].append(row[i])
I get the following error:
*---------
KeyError Traceback (most recent call last)
Input In [51], in <cell line: 2>()
2 for i,col in enumerate(data.dtype.names):
3 for row in data:
----> 4 d[col].append(row[i])
KeyError: 'time'*
I have researched a bit online and it could be related to the data type column "time". My guess, but I am pretty sure I am wrong, is that in the list comprehension each column is created as NumPy array directly, whereas here I am not setting it to be as such beforehand (and hence the problem with the data type).
Any help would be highly appreciated. Many thanks!
CodePudding user response:
To produce the same result as the dictionary comprehension that you've provided:
d = {}
for i, col in enumerate(data.dtype.names):
values = []
for row in data:
values.append(row[i])
d[col] = np.array(values)
The error that you're getting is due to the fact that your dictionary d
is empty (you have created it like so: d = {}
. It does not contain the key 'time'. You could create the key like this: d['time'] = some_value
, but you can't just access it if it doesn't exist.
If you want, you can use the collections.defaultdict. With it, you don't have to create the keys. If you access non-existend keys, the default value will be returned.
With your original code it would look like this:
from collections import defaultdict
d = defaultdict(list)
for i, col in enumerate(data.dtype.names):
for row in data:
d[col].append(row[i])
dict(d)
Then however, the values of your dictionary are not np.ndaray
s, but simple lists.