Home > Back-end >  efficient way of computing a list with mean of values in another list
efficient way of computing a list with mean of values in another list

Time:05-14

I need to compute a list with the mean values of another list. To be more precise, the input list have this form:

input_list =

['1.538075/42.507325',
 '1.537967/42.507690',
 '1.538292/42.507742',
 '1.538399/42.507376',
 '1.538075/42.507325']

And I need to compute a list with the mean of the values before and after the slash ("/"), like this result:

desired_output =

[1.5381616, 42.5074916]

I can obtain the desired_output correctly using this code:

desired_output = pd.Series(input_list)\
                .apply(lambda r: pd.Series(r.split('/')))\
                .astype(float)\
                .mean()\
                .tolist()

However, I have a very large number of input lists and the proposed code is somewhat slow, so I need to find a more efficient way to do it.

Any suggestions?

CodePudding user response:

You don't really need pandas here, a simple list comprehension should work:

input_list = ['1.538075/42.507325',
 '1.537967/42.507690',
 '1.538292/42.507742',
 '1.538399/42.507376',
 '1.538075/42.507325']

from statistics import mean

out = [mean(map(float, x)) for x in zip(*(x.split('/') for x in input_list))]

output: [1.5381616, 42.5074916]

Or using numpy:

np.vstack([np.fromstring(s, sep='/') for s in input_list]).mean(0).tolist()

CodePudding user response:

.apply is the slow part, but luckily Pandas has the .str accessor to vectorise string operations. This should be considerably faster:

desired_output = (pd.Series(input_list)
                  .str.split('/', expand=True)
                  .astype(float)
                  .mean()
                  .tolist())

CodePudding user response:

Create a numpy array with dtype=float, then calculate mean along axis=0

np.array([s.split('/') for s in input_list], dtype=float).mean(0)

array([ 1.5381616, 42.5074916])

CodePudding user response:

Another way, using pandas and comprehensions -

pd.DataFrame([_.split('/') for _ in input_list]).astype(float).mean().to_list()
# [1.5381616, 42.5074916]
  • Related