I have data in a Python list that I'm pulling from a database (sqlite) in the following format:
# This is an example
data = [(1, '12345', 1, 0, None), (1, '34567', 1, 1, None)]
From this list of tuples, I want to create a 2D numpy array, converting each tuple to an array. While doing so, I also want to be able to specify transformations on the data. Specifically, I want the values at index 1 in the tuples to be converted from string to numbers, and the values at last index be converted to 0 if None, 1 otherwise.
Here is what the example data should look like afterwards:
transformed_data = np.asarray([[1, 12345, 1, 0, 0], [1, 34567, 1, 1, 0]])
I am able to do so with simple for loops, however I'd like to know if there is a more "Pythony" solution, either with native numpy methods or otherwise. I am working with a very large database, so complexity matters. Thanks in advance.
CodePudding user response:
pandas is quite good at this:
import pandas as pd
# set up DataFrame
transformed_data = (pd.DataFrame(data)
# convert to numeric
.apply(pd.to_numeric, errors='coerce')
# replace null with 0
# trying to cast as integer if possible
.fillna(0, downcast='infer')
# convert to numpy array
.to_numpy()
)
output:
array([[ 1, 12345, 1, 0, 0],
[ 1, 34567, 1, 1, 0]])
CodePudding user response:
If your tuple is small and of a fixed size then you can use a list comprehension:
result = [(a, int(b), c, d, 0 if e is None else e) for a, b, c, d, e in data]
Or a little shorter:
result = [(d[0], int(d[1]), *d[2:4], d[4] if d[4] else 0) for d in data]