Python find the unique values in a spesific column. 2d array-CodePudding

Good day.

If I have the following array:

[11, "apples", 22, 11], [12, "pear", 24, 11], [13, "bannana", 18, 11], [14, "pear", 17, 11]

How can I change the array to only show data from user pear? I want to collect all the values from column 1 of user pear. (12, 14)

Or alrternatively how can I find the values that are unique in colum 2, e.g. apples, pear and bannana. And then filter by pear to find the data only of pear. [12, "pear", 24, 11], [14, "pear", 17, 11]

What have I tried and vary forms of it:

uniqueRows = np.unique(array, axis=:,1)

This is what I can use to filter if I have the unique values.

new_arr = np.array([[11, "apples", 22, 11], [12, "pear", 24, 11], [13, "bannana", 18, 11], [14, "pear", 17, 11]])
new_val = np.array(["pear"])
result = np.in1d(new_arr[:, 1], new_val)
z = new_arr[result]

CodePudding user response：

Pandas Way

import numpy as np
import pandas as pd

new_arr = np.array([[11, "apples", 22, 11], [12, "pear", 24, 11], [13, "banana", 18, 11], [14, "pear", 17, 11]])

df = pd.DataFrame(new_arr,columns=['A','B','C','D'])

result = df[df.B=='pear']
print(result)
'''
    A     B   C   D
1  12  pear  24  11
3  14  pear  17  11
'''
#or

result_2 = df['B'].drop_duplicates()
print(result_2)
'''
0    apples
1      pear
2    banana
'''

However instead of drop_duplicate you can use unique() but this way is faster.

CodePudding user response：

Based on your question, it appears that using numpy isn't necessarily a requirement, so this is how you could do it using standard list comprehension. Feel free to select a different answer if numpy is a hard requirement

I'll assume the input is called data.

First we iterate through each item in our list and we could get each word key

[item[1] for item in data]

This generates a list of the words (the second element of each subarray)

['apples', 'pear', 'bannana', 'pear']

But we don't really want that array as our result, we want to check that array to get our result. So using the same for..in syntax, we can check each item as we iterate

[item[1] for item in data if item[1] == 'pear']

Gets us

['pear', 'pear']

So now we have filtered down to all pear sub-arrays. But we really want the first item (or possibly the whole object) so we can just change the index of what we're "selecting" at the beginning of our for...in list comprehension.

[item[0] for item in data if item[1] == 'pear']

That will give us

[12, 14]

If you do want the whole item, you can just choose not to index the result at all

[item for item in data if item[1] == 'pear']

[[12, 'pear', 24, 11], [14, 'pear', 17, 11]]

CodePudding user response：

alrternatively how can I find the values that are unique in colum 2, e.g. apples, pear and bannana. And then filter by pear to find the data only of pear. [12, "pear", 24, 11], [14, "pear", 17, 11].

I believe for the above, you have already answered it yourself. Correct me if I interpreted it wrong.

How can I change the array to only show data from user pear? I want to collect all the values from column 1 of user pear. (12, 14)

If you want to collect only from 1st column, you can transpose the array created by your code, and return the first array.

new_arr = np.array([[11, "apples", 22, 11], [12, "pear", 24, 11], [13, "bannana", 18, 11], [14, "pear", 17, 11]])
new_val = np.array(["pear"])
result = np.in1d(new_arr[:, 1], new_val)
z = new_arr[result]

The above is what you wrote, you can then do:

print(z.T[0])

#If you want it in list, you can do
#print(list(z.T[0]))

Output:

array(['12', '14'], dtype='<U21')