Home > Net >  how to sum each rows the credit score with the condition using pandas?
how to sum each rows the credit score with the condition using pandas?

Time:06-17

how to sum each rows the credit score with the condition that if the course does not pass, then the credit score is not counted?

enter image description here

example: enter image description here

expected outcome: if pass

df_ilkom1['sks'] = (df_ilkom['s_matdas'] df_ilkom['s_fisdas'] df_ilkom['s_pentekkin'] df_ilkom['s_pemdas'] df_ilkom['s_pengpend'] df_ilkom['s_pendag'] df_ilkom['s_pancasila'] df_ilkom['s_bindo'])

*first row df_ilkom1['sks'] = 3 3 2 3 2 3 2 3 = 21 but in n_pendag = D, automatically s_pendag can't use for the input.

I was thinking like this

df_ilkom1['SKS'] = (df_ilkom1.loc[:, 's_matdas':'s_bindo'].sum(axis=1) - df_ilkom1.loc[:, 'n_matdas':'n_bindo'].isin(['D','E'])

but it sure doesn't work.

any suggestion

CodePudding user response:

Maybe first you could create column SKS with default value

df['SKS'] = 0

next select rows which don't have 'D','E' in n_pendag

mask = ~df['n_pendag'].isin(['D','E'])

and run sum() only on these rows

df['SKS'][mask] = df.loc[:,'n_matdas':'n_bindo'][mask].sum(axis=1)

Minimal working code:

import pandas as pd

data = {
    'n_pendag': ['A','B','C','D','E'], 
    'n_matdas': [1,2,3,4,5], 
    'n_bindo' : [6,7,8,9,0]
}

df = pd.DataFrame(data)

df['SKS'] = 0

mask = ~df['n_pendag'].isin(['D','E'])
df['SKS'][mask] = df.loc[:,'n_matdas':'n_bindo'][mask].sum(axis=1)

print(df)

Result:

 n_pendag  n_matdas  n_bindo  SKS
0        A         1        6    7
1        B         2        7    9
2        C         3        8   11
3        D         4        9    0
4        E         5        0    0

Other idea is to use apply() to run function with if/else

def get_sum(row):
    if row['n_pendag'] in ['D','E']:
        return 0
    else:
        return row.loc['n_matdas':'n_bindo'].sum()

df['SKS'] = df.apply(get_sum, axis=1)

Minimal working code:

import pandas as pd

data = {
    'n_pendag': ['A','B','C','D','E'], 
    'n_matdas': [1,2,3,4,5], 
    'n_bindo' : [6,7,8,9,0]
}

df = pd.DataFrame(data)

def get_sum(row):
    if row['n_pendag'] in ['D','E']:
        return 0
    else:
        return row.loc['n_matdas':'n_bindo'].sum()

df['SKS'] = df.apply(get_sum, axis=1)

print(df)

Next idea is to use numpy.where(condition, value_for_true, value_for_false)

import pandas as pd
import numpy as np

data = {
    'n_pendag': ['A','B','C','D','E'], 
    'n_matdas': [1,2,3,4,5], 
    'n_bindo' : [6,7,8,9,0]
}

df = pd.DataFrame(data)

df['SKS'] = np.where(~df['n_pendag'].isin(['D','E']), df.loc[:,'n_matdas':'n_bindo'].sum(axis=1), 0)

print(df)

And last idea: first calculate sum for all rows and later put 0 in SKS when 'D','E' in n_pendag

import pandas as pd

data = {
    'n_pendag': ['A','B','C','D','E'], 
    'n_matdas': [1,2,3,4,5], 
    'n_bindo' : [6,7,8,9,0]
}

df = pd.DataFrame(data)

df['SKS'] = df.loc[:,'n_matdas':'n_bindo'].sum(axis=1)

df['SKS'][ df['n_pendag'].isin(['D','E']) ] = 0

print(df)

CodePudding user response:

IIUC, we have a DataFrame with a number of columns with letters (codes), and the same number of columns with numbers (scores). For each row we want to sum all scores, except the ones for which the corresponding code occurs in a certain list (exceptions). We can use np.where() to accomplish this.

Creating a sample DataFrame:

import pandas as pd
import numpy as np
import string
n=1000
letters = np.random.choice([x for x in string.ascii_uppercase[:5]], n).reshape(int(n/4),4)
numbers = np.random.randint(1,10, (int(n/4),4))

df = pd.DataFrame(np.hstack([letters, numbers]))
df.iloc[:,4:] = df.iloc[:,4:].astype(int)
df.columns = [f'code_{idx 1}' if idx < 4 else f'score_{idx-3}' for idx in range(8)]
df.head()

  code_1 code_2 code_3 code_4 score_1 score_2 score_3 score_4
0      D      C      D      D       7       7       6       2
1      D      A      C      C       9       7       4       9
2      B      D      E      A       2       3       3       9
3      D      A      A      B       6       1       6       2
4      A      A      A      E       7       6       3       9

Performing calculation and adding column with the result:

exceptions = ['D','E']
df['total_score'] = np.where(~df.loc[:,'code_1':'code_4']
                             .isin(exceptions),df.loc[:,'score_1':'score_4'], 0).sum(axis=1)
df.head()

  code_1 code_2 code_3 code_4 score_1 score_2 score_3 score_4 total_score
0      D      C      D      D       7       7       6       2           7 # summing score_2 ('C')
1      D      A      C      C       9       7       4       9          20 # summing score_2, 3, 4 ('A', 'C', 'C')
2      B      D      E      A       2       3       3       9          11
3      D      A      A      B       6       1       6       2           9
4      A      A      A      E       7       6       3       9          16
  • Related