Home > Mobile >  Function of a formula in python for big dataset
Function of a formula in python for big dataset

Time:10-06

Say, I need to convert the following formula into python and use it on a data frame. The formula is as follows.

enter image description here

The data frame has got values for Xj and Bj. The data frame looks like the following.

 df.head()
type name   OR
1   SAP1    11.21
1   SAP1    1301
2   SAP1    0.7578
2   LUF1    1447
2   LUF1    0.7578
1   ANK3    1150
1   ANK3    0.9909
1   ANK3    1535
1   ACR     0.9909
1   ACR     1535

The above data frame has got values for Xj=type and bj=OR from the data frame. I need for each sam The ^S score from the formula In the end, for each name, I need a single S score.

I have implemented something like this,

   def score(df):
        df_sum   =df[['type','name','OR']].groupby('name').sum().reset_index()
        sum   =df_sum['type']
        OR=df_sum['OR']
        score=sum([sum*OR])
        return score

The question is I need to get them for each name value single score not many scores for all repeating name values. In the end, I need only 4 rows with 4 score values.

It is possible on small datasets. However, when I give a big data frame with multiple name values for OR and type columns. Then I have multiple scores for the same name

Note: The values in column type are binaries. Hence, I am not sure I am use in grouby to count or sum it

CodePudding user response:

I think if you change the order of opérations you can have the expected result:

1 calculate xi*bi

df['xibi'] =df ['type'] * df['OR']

2 groupby and sum

res=df.groupby('name')['xibi'].sum()
  • Related