Concatenating and sorting-CodePudding

cols = [2,4,6,8,10,12,14,16,18] # selected the columns i want to work with
df = pd.read_csv('mywork.csv')
df1 = df.iloc[:, cols]
b= np.array(df1)
b

outcome

array([['WV5 6NY', 'RE4 9VU', 'BU4 N90', 'TU3 5RE', 'NE5 4F'],
       ['SA8 7TA', 'BA31 0PO', 'DE3 2FP', 'LR98 4TS', nan],
       ['MN0 4NU', 'RF5 5FG', 'WA3 0MN', 'EA15 8RE', 'BE1 4RE'],
       ['SB7 0ET', 'SA7 0SB', 'BT7 6NS', 'TA9 0LP' nan]], dtype=object)

a = np.concatenate(b) #concatenated to get a single array, this worked well

print(np.sort(a)) # to sort alphabetically
it gave me error **error AxisError: axis -1 is out of bounds for array of dimension 0*


I also tried using  a.sort() it is also giving me **TypeError: '<' not supported between instances of 'float' and 'str'**

The above is a CSV file containing list of postcodes of different persons which involves travelling from one postcode to another for different jobs, a person could travel to 5 postcoodes a day. using numpy array, I got list of list of postcodes.

I then concatenate the list of postcode to get one big list of postcode after which I want to sort it in an alphabetical order but it kept giving me errors.

Please, can someone help

CodePudding user response：

As it was mentioned in the comments, this error is caused by the comparison of nan to string. To fix this, you cannot use a NumPy array (for sorting), but rather a list.

Convert the array to a list
Remove the nan values
Sort

# Get the data (in your scenario, this would be achieved by reading from your file)
b = np.array([['WV5 6NY', 'RE4 9VU', 'BU4 N90', 'TU3 5RE', 'NE5 4F'],
       ['SA8 7TA', 'BA31 0PO', 'DE3 2FP', 'LR98 4TS', nan],
       ['MN0 4NU', 'RF5 5FG', 'WA3 0MN', 'EA15 8RE', 'BE1 4RE'],
       ['SB7 0ET', 'SA7 0SB', 'BT7 6NS', 'TA9 0LP', nan]], dtype=object)

# Flatten
a = np.concatenate(b)

# Remove nan values - they are converted to strings when concatenated
a = np.array([x for x in a if x != 'nan'])

# Finally, sort
a.sort()