Create numpy array from Python list with operations-CodePudding

I have data in a Python list that I'm pulling from a database (sqlite) in the following format:

# This is an example
data = [(1, '12345', 1, 0, None), (1, '34567', 1, 1, None)]

From this list of tuples, I want to create a 2D numpy array, converting each tuple to an array. While doing so, I also want to be able to specify transformations on the data. Specifically, I want the values at index 1 in the tuples to be converted from string to numbers, and the values at last index be converted to 0 if None, 1 otherwise.

Here is what the example data should look like afterwards:

transformed_data = np.asarray([[1, 12345, 1, 0, 0], [1, 34567, 1, 1, 0]])

I am able to do so with simple for loops, however I'd like to know if there is a more "Pythony" solution, either with native numpy methods or otherwise. I am working with a very large database, so complexity matters. Thanks in advance.

CodePudding user response：

pandas is quite good at this:

import pandas as pd
                      # set up DataFrame
transformed_data = (pd.DataFrame(data)
                      # convert to numeric
                      .apply(pd.to_numeric, errors='coerce')
                      # replace null with 0
                      # trying to cast as integer if possible
                      .fillna(0, downcast='infer')
                      # convert to numpy array
                      .to_numpy()
                   )

output:

array([[    1, 12345,     1,     0,     0],
       [    1, 34567,     1,     1,     0]])

CodePudding user response：

If your tuple is small and of a fixed size then you can use a list comprehension:

result = [(a, int(b), c, d, 0 if e is None else e) for a, b, c, d, e in data]

Or a little shorter:

result = [(d[0], int(d[1]), *d[2:4], d[4] if d[4] else 0) for d in data]