I'm converting sentences to embeddings/indexes generated through OpenAI 'embeddings' endpoint.
E.G. I'm sending 'n' sentences ["sentenceA","sentenceB","sentenceC","sentenceD","sentenceE"]
and I'm getting as a response something like:
[
[0.001542, 0.889456, 0.155421, 0.884747], // array for sentenceA
[0.999956, 0.987778, 0.122222, 0.848484], // array for sentenceB
[0.123456, 0.588847, 0.945125, 0.911111], // array for sentenceC
(etc)
]
Each array having the very same length (in my use case, 1536 values each array).
I would need to convert that list of 'n' arrays into one, calculating the average of all the arrays (in the first element, the average of all arrays' first element, etc.); having as a result just one array with 1536 elements
Which would be the most easy/efficient way to do so with Python / Numpy?
Thank you in advance, and have a great day! :D
CodePudding user response:
You can use the numpy' function "mean", which returns the mean based on the axis you provide, in code it will be something like this:
#We import numpy and we declare the lists
import numpy as np
arrays=[[0.001542, 0.889456, 0.155421, 0.884747],
[0.999956, 0.987778, 0.122222, 0.848484],
[0.123456, 0.588847, 0.945125, 0.911111]]
#Then we use the mean function
mean_array=np.mean(arrays, axis = 0)
If we print "mean_array" we get this:
array([0.37498467, 0.822027 , 0.40758933, 0.88144733])
It is automatized, so if you increase the number of lists, you will still getting one single array.