cols = [2,4,6,8,10,12,14,16,18] # selected the columns i want to work with
df = pd.read_csv('mywork.csv')
df1 = df.iloc[:, cols]
b= np.array(df1)
b
outcome
array([['WV5 6NY', 'RE4 9VU', 'BU4 N90', 'TU3 5RE', 'NE5 4F'],
['SA8 7TA', 'BA31 0PO', 'DE3 2FP', 'LR98 4TS', nan],
['MN0 4NU', 'RF5 5FG', 'WA3 0MN', 'EA15 8RE', 'BE1 4RE'],
['SB7 0ET', 'SA7 0SB', 'BT7 6NS', 'TA9 0LP' nan]], dtype=object)
a = np.concatenate(b) #concatenated to get a single array, this worked well
print(np.sort(a)) # to sort alphabetically
it gave me error **error AxisError: axis -1 is out of bounds for array of dimension 0*
I also tried using a.sort() it is also giving me **TypeError: '<' not supported between instances of 'float' and 'str'**
The above is a CSV file containing list of postcodes of different persons which involves travelling from one postcode to another for different jobs, a person could travel to 5 postcoodes a day. using numpy array, I got list of list of postcodes.
I then concatenate the list of postcode to get one big list of postcode after which I want to sort it in an alphabetical order but it kept giving me errors.
Please, can someone help
CodePudding user response:
As it was mentioned in the comments, this error is caused by the comparison of nan to string. To fix this, you cannot use a NumPy array (for sorting), but rather a list.
- Convert the array to a list
- Remove the nan values
- Sort
# Get the data (in your scenario, this would be achieved by reading from your file)
b = np.array([['WV5 6NY', 'RE4 9VU', 'BU4 N90', 'TU3 5RE', 'NE5 4F'],
['SA8 7TA', 'BA31 0PO', 'DE3 2FP', 'LR98 4TS', nan],
['MN0 4NU', 'RF5 5FG', 'WA3 0MN', 'EA15 8RE', 'BE1 4RE'],
['SB7 0ET', 'SA7 0SB', 'BT7 6NS', 'TA9 0LP', nan]], dtype=object)
# Flatten
a = np.concatenate(b)
# Remove nan values - they are converted to strings when concatenated
a = np.array([x for x in a if x != 'nan'])
# Finally, sort
a.sort()