Histogram a dimension of a multidimensional numpy array and using it to normalize the histogram of a-CodePudding

given a numpy array of the following format:

'aa', '1' 
'ab', '1'
'ab', '1'
'ba', '2'
'ba', '2'

I can use numpy.unique to histogram the elements of each column individually. Histogramming the first column gives me the unique count of elements in the first column as follows:

'aa' = 1; 'ab' = 2; 'ba' = 2.

Histogramming the second column will give me

'1' = 3; '2' = 2.

I want to normalize the output of the first column histogram with the output of the second column histogram to give:

'aa' = 1/3; 'ab' = 2/3;  'ba' = 2/2;

Is there a nice way to achieve this?

CodePudding user response：

start by getting the counts and unique values of each number, and turn it into a lookup dict

unique_vals, counts = np.unique(arr[:,1],return_counts=True)    
freq = 1 / counts
lookup = dict(zip(unique_vals, freq))

get a numpy array of the corresponding frequencies, then look up each unique letter in the array and sum the frequencies.

freqs = np.vectorize(lookup.get)(arr[:,1]) 
unique_letters = np.unique(arr[:,0])
{ letter: np.sum(freqs[letter==arr[:,0]]) for letter in unique_letters}

returns

{'aa': 0.3333333333333333, 'ab': 0.6666666666666666, 'ba': 1.0}