AttributeError: 'tuple' object has no attribute 'lower' when returning specific-CodePudding

I have an array like this

array([[('Weather1', 57), 428, '74827'],
       [('weather1', 57), 429, '74828'],
       [('weather1', 57) 409, '74808'],
       [('weather2', 57) 11553, '76568'],
       [('weather2', 57) 11573, '76574'],

I want to return only the [2] values into a new array group by the values in [0]

Final outcome:

array([['74827', '74828', '74808'],['76568', '76574']]

I use this code:

read_data = [] # stores Weather1, Weather2 etc. as we read that
final_array = [] # stores final arrays

# stores data for weather1, then clears it out and
# then stores data for weather2, and so on...
sub_array = [] 

# read each item of array
for x in array:

    # e.g. for first row, is Weather1 already read?
    # No, it's not read
    if x[0].lower() not in read_data:

        # when you reach weather 2 and hit this statement,
        # sub_array will have data from weather1. So, if you find
        # sub_array with data, it is time to add it to the final_array
        # and start fresh with the sub_array
        if len(sub_array) > 0:
            final_array.append(sub_array)
            sub_array = [x[2]]
        # if sub_array is empty, just add data to it
        else:
            sub_array.append(x[2])
        
        # make sure that read_data contains the item you read
        read_data.append(x[0].lower())

    # if weather1 has been read already, just add item to sub_array
    else:
        sub_array.append(x[2])

# After you are done reading all the lines, sub_array may have data in it
# if so, add to the final alrray
if len(sub_array) > 0:
    final_array.append(sub_array)

However, as index 0 is a tuple I get back

AttributeError: 'tuple' object has no attribute 'lower'

Any ideas on how to fix it?

CodePudding user response：

cast it to str first ?

str(x[0]).lower()

CodePudding user response：

You can do this in a much shorter and more efficient way by using a combination of np.unique and np.split:

_, counts = np.unique(np.array([str(tup.lower()) for tup in a[:, 0]]), return_counts=True)
splits = np.split(a, counts.cumsum()[:-1])
splits = [s[:, 2].tolist() for s in splits]

Output:

>>> splits
[['74827', '74828', '74808'], ['76568', '76574']]

CodePudding user response：

import numpy as np
import pandas as pd

data = np.array([[('Weather1', 57), 428, '74827'],
                 [('weather1', 57), 429, '74828'],
                 [('weather1', 57), 409, '74808'],
                 [('weather2', 57), 11553, '76568'],
                 [('weather2', 57), 11573, '76574']])

df = pd.DataFrame(data)

# Fix uppercase "Weather"
df[0] = df[0].apply(lambda x: x[0].lower())

newdata = [group[1].loc[:, 2].values for group in df.groupby(0)]

print(newdata)

[array(['74827', '74828', '74808'], dtype=object), array(['76568', '76574'], dtype=object)]