I would like to calculate RMS, Count, SUM to array inside all columns of pandas dataframe and then fulfill outputs into new three dataframes as shown below
P.S > solution should deal with N numbers of columns, in my case, I have around 300 columns x,y,z,a,b,c ....... etc ...... N
ID x y z ….. EF407412 [471, 1084, 1360, 2284] [1408, 1572, 2277] [977, 1003, 1493, 1519, 1650, 1676, 2804] ….. KM043272 [2173] [1293, 2354,] [1200] ….. KM043273 ….. ….. ….. ….. ….. ….. ….. ….. …..
Dataframe_RMS
ID x y z …..
EF407412 1454.5749 1792.4263 1685.6893 …..
KM043272 2173 1899.1004 1200 …..
KM043273 ….. ….. ….. …..
….. ….. ….. ….. …..
Dataframe_Count
ID x y z …..
EF407412 4 3 7 …..
KM043272 1 2 1 …..
KM043273 ….. ….. ….. …..
….. ….. ….. ….. …..
CodePudding user response:
Updating answer as per the OP's comment - for any number of Columns Check Below code:
import pandas as pd
from ast import literal_eval
import numpy as np
df = pd.DataFrame({'ID':['EF407412','KM043272']
, 'x': ['[471, 1084, 1360, 2284]','[2173]']
, 'y': ['[1408, 1572, 2277]','[1293, 2354,]']
, 'z': ['[977, 1003, 1493, 1519, 1650, 1676, 2804]','[1200]']} )
col_num = df.shape[1]
COUNT
df[[i "_count" for i in df.columns[1:col_num]]] = df.apply(lambda x: ("{},"*(col_num-1))[:-1].\
format( *(tuple([len(literal_eval(x[col])) for col in df.columns[1:col_num]] ))),axis=1).\
astype('str').str.split(',', expand=True).values
df[['ID'] ([ col for col in df.columns if col.endswith('count')])]
OUTPUT:
SUM
df[[i "_sum" for i in df.columns[1:col_num]]] = df.apply(lambda x: ("{},"*(col_num-1))[:-1].\
format( *(tuple([sum(literal_eval(x[col])) for col in df.columns[1:col_num]] ))),axis=1).\
astype('str').str.split(',', expand=True).values
df[['ID'] ([ col for col in df.columns if col.endswith('sum')])]
Output:
RMS
df[[i "_rms" for i in df.columns[1:col_num]]] = df.apply(lambda x: ("{},"*(col_num-1))[:-1].\
format( *(tuple([np.sqrt(np.mean(np.square(literal_eval(x[col])))) for col in df.columns[1:col_num]] ))),axis=1).\
astype('str').str.split(',', expand=True).values
df[['ID'] ([ col for col in df.columns if col.endswith('rms')])]
Output: