For doing so, I have a list of lists (which are my clusters), for example:
asset_clusts=[[0,1],[3,5],[2,4, 12],...]
and original dataframe(in my code I call it 'x') is as : return time series of s&p 500 companies
I want to choose column [0,1] of the original dataframe and compute the mean (by row) of them and store it in a new dataframe, then compute the mean of columns [3, 5], and add it to the new dataframe, and so on ...
mu=pd.DataFrame()
for j in range(get_number_of_elements(asset_clusts)):
mu=x.iloc[:,asset_clusts[j]].mean(axis=1)
but, it gives to me only a column and i checked, this one column is the mean of last cluster columns
in case of ambiguity, function of get_number_of_elements is:
def get_number_of_elements(clist):
count = 0
for element in clist:
count = 1
return count
CodePudding user response:
def get_number_of_elements(clust_list):
count = 0
for element in clust_list:
count = 1
return count
CodePudding user response:
I solved it and in case if it would be helpful for others, here is the final function:
def clustered_series(x, org_asset_clust):
"""
x:return data
org_asset_clust: list of clusters
----> mean of each cluster returns by row
"""
def get_number_of_elements(org_asset_clust):
count = 0
for element in org_asset_clust:
count = 1
return count
mu=[]
for j in range(get_number_of_elements(org_asset_clust)):
mu.append(x.iloc[:,org_asset_clust[j]].mean(axis=1))
cluster_mean=pd.concat(mu, axis=1)
return cluster_mean