I need to compute a list with the mean values of another list. To be more precise, the input list have this form:
input_list =
['1.538075/42.507325',
'1.537967/42.507690',
'1.538292/42.507742',
'1.538399/42.507376',
'1.538075/42.507325']
And I need to compute a list with the mean of the values before and after the slash ("/"), like this result:
desired_output =
[1.5381616, 42.5074916]
I can obtain the desired_output correctly using this code:
desired_output = pd.Series(input_list)\
.apply(lambda r: pd.Series(r.split('/')))\
.astype(float)\
.mean()\
.tolist()
However, I have a very large number of input lists and the proposed code is somewhat slow, so I need to find a more efficient way to do it.
Any suggestions?
CodePudding user response:
You don't really need pandas here, a simple list comprehension should work:
input_list = ['1.538075/42.507325',
'1.537967/42.507690',
'1.538292/42.507742',
'1.538399/42.507376',
'1.538075/42.507325']
from statistics import mean
out = [mean(map(float, x)) for x in zip(*(x.split('/') for x in input_list))]
output: [1.5381616, 42.5074916]
Or using numpy:
np.vstack([np.fromstring(s, sep='/') for s in input_list]).mean(0).tolist()
CodePudding user response:
.apply
is the slow part, but luckily Pandas has the .str
accessor to vectorise string operations. This should be considerably faster:
desired_output = (pd.Series(input_list)
.str.split('/', expand=True)
.astype(float)
.mean()
.tolist())
CodePudding user response:
Create a numpy array with dtype=float
, then calculate mean along axis=0
np.array([s.split('/') for s in input_list], dtype=float).mean(0)
array([ 1.5381616, 42.5074916])
CodePudding user response:
Another way, using pandas and comprehensions -
pd.DataFrame([_.split('/') for _ in input_list]).astype(float).mean().to_list()
# [1.5381616, 42.5074916]