I have an array like this
array([[('Weather1', 57), 428, '74827'],
[('weather1', 57), 429, '74828'],
[('weather1', 57) 409, '74808'],
[('weather2', 57) 11553, '76568'],
[('weather2', 57) 11573, '76574'],
I want to return only the [2]
values into a new array group by the values in [0]
Final outcome:
array([['74827', '74828', '74808'],['76568', '76574']]
I use this code:
read_data = [] # stores Weather1, Weather2 etc. as we read that
final_array = [] # stores final arrays
# stores data for weather1, then clears it out and
# then stores data for weather2, and so on...
sub_array = []
# read each item of array
for x in array:
# e.g. for first row, is Weather1 already read?
# No, it's not read
if x[0].lower() not in read_data:
# when you reach weather 2 and hit this statement,
# sub_array will have data from weather1. So, if you find
# sub_array with data, it is time to add it to the final_array
# and start fresh with the sub_array
if len(sub_array) > 0:
final_array.append(sub_array)
sub_array = [x[2]]
# if sub_array is empty, just add data to it
else:
sub_array.append(x[2])
# make sure that read_data contains the item you read
read_data.append(x[0].lower())
# if weather1 has been read already, just add item to sub_array
else:
sub_array.append(x[2])
# After you are done reading all the lines, sub_array may have data in it
# if so, add to the final alrray
if len(sub_array) > 0:
final_array.append(sub_array)
However, as index 0 is a tuple I get back
AttributeError: 'tuple' object has no attribute 'lower'
Any ideas on how to fix it?
CodePudding user response:
cast it to str first ?
str(x[0]).lower()
CodePudding user response:
You can do this in a much shorter and more efficient way by using a combination of np.unique
and np.split
:
_, counts = np.unique(np.array([str(tup.lower()) for tup in a[:, 0]]), return_counts=True)
splits = np.split(a, counts.cumsum()[:-1])
splits = [s[:, 2].tolist() for s in splits]
Output:
>>> splits
[['74827', '74828', '74808'], ['76568', '76574']]
CodePudding user response:
import numpy as np
import pandas as pd
data = np.array([[('Weather1', 57), 428, '74827'],
[('weather1', 57), 429, '74828'],
[('weather1', 57), 409, '74808'],
[('weather2', 57), 11553, '76568'],
[('weather2', 57), 11573, '76574']])
df = pd.DataFrame(data)
# Fix uppercase "Weather"
df[0] = df[0].apply(lambda x: x[0].lower())
newdata = [group[1].loc[:, 2].values for group in df.groupby(0)]
print(newdata)
[array(['74827', '74828', '74808'], dtype=object), array(['76568', '76574'], dtype=object)]