how to sum each rows the credit score with the condition that if the course does not pass, then the credit score is not counted?
example: enter image description here
expected outcome: if pass
df_ilkom1['sks'] = (df_ilkom['s_matdas'] df_ilkom['s_fisdas'] df_ilkom['s_pentekkin'] df_ilkom['s_pemdas'] df_ilkom['s_pengpend'] df_ilkom['s_pendag'] df_ilkom['s_pancasila'] df_ilkom['s_bindo'])
*first row df_ilkom1['sks'] = 3 3 2 3 2 3 2 3 = 21 but in n_pendag = D, automatically s_pendag can't use for the input.
I was thinking like this
df_ilkom1['SKS'] = (df_ilkom1.loc[:, 's_matdas':'s_bindo'].sum(axis=1) - df_ilkom1.loc[:, 'n_matdas':'n_bindo'].isin(['D','E'])
but it sure doesn't work.
any suggestion
CodePudding user response:
Maybe first you could create column SKS
with default value
df['SKS'] = 0
next select rows which don't have 'D','E'
in n_pendag
mask = ~df['n_pendag'].isin(['D','E'])
and run sum()
only on these rows
df['SKS'][mask] = df.loc[:,'n_matdas':'n_bindo'][mask].sum(axis=1)
Minimal working code:
import pandas as pd
data = {
'n_pendag': ['A','B','C','D','E'],
'n_matdas': [1,2,3,4,5],
'n_bindo' : [6,7,8,9,0]
}
df = pd.DataFrame(data)
df['SKS'] = 0
mask = ~df['n_pendag'].isin(['D','E'])
df['SKS'][mask] = df.loc[:,'n_matdas':'n_bindo'][mask].sum(axis=1)
print(df)
Result:
n_pendag n_matdas n_bindo SKS
0 A 1 6 7
1 B 2 7 9
2 C 3 8 11
3 D 4 9 0
4 E 5 0 0
Other idea is to use apply()
to run function with if/else
def get_sum(row):
if row['n_pendag'] in ['D','E']:
return 0
else:
return row.loc['n_matdas':'n_bindo'].sum()
df['SKS'] = df.apply(get_sum, axis=1)
Minimal working code:
import pandas as pd
data = {
'n_pendag': ['A','B','C','D','E'],
'n_matdas': [1,2,3,4,5],
'n_bindo' : [6,7,8,9,0]
}
df = pd.DataFrame(data)
def get_sum(row):
if row['n_pendag'] in ['D','E']:
return 0
else:
return row.loc['n_matdas':'n_bindo'].sum()
df['SKS'] = df.apply(get_sum, axis=1)
print(df)
Next idea is to use numpy.where(condition, value_for_true, value_for_false)
import pandas as pd
import numpy as np
data = {
'n_pendag': ['A','B','C','D','E'],
'n_matdas': [1,2,3,4,5],
'n_bindo' : [6,7,8,9,0]
}
df = pd.DataFrame(data)
df['SKS'] = np.where(~df['n_pendag'].isin(['D','E']), df.loc[:,'n_matdas':'n_bindo'].sum(axis=1), 0)
print(df)
And last idea: first calculate sum
for all rows and later put 0
in SKS
when 'D','E'
in n_pendag
import pandas as pd
data = {
'n_pendag': ['A','B','C','D','E'],
'n_matdas': [1,2,3,4,5],
'n_bindo' : [6,7,8,9,0]
}
df = pd.DataFrame(data)
df['SKS'] = df.loc[:,'n_matdas':'n_bindo'].sum(axis=1)
df['SKS'][ df['n_pendag'].isin(['D','E']) ] = 0
print(df)
CodePudding user response:
IIUC, we have a DataFrame with a number of columns with letters (codes), and the same number of columns with numbers (scores). For each row we want to sum all scores, except the ones for which the corresponding code occurs in a certain list (exceptions). We can use np.where()
to accomplish this.
Creating a sample DataFrame:
import pandas as pd
import numpy as np
import string
n=1000
letters = np.random.choice([x for x in string.ascii_uppercase[:5]], n).reshape(int(n/4),4)
numbers = np.random.randint(1,10, (int(n/4),4))
df = pd.DataFrame(np.hstack([letters, numbers]))
df.iloc[:,4:] = df.iloc[:,4:].astype(int)
df.columns = [f'code_{idx 1}' if idx < 4 else f'score_{idx-3}' for idx in range(8)]
df.head()
code_1 code_2 code_3 code_4 score_1 score_2 score_3 score_4
0 D C D D 7 7 6 2
1 D A C C 9 7 4 9
2 B D E A 2 3 3 9
3 D A A B 6 1 6 2
4 A A A E 7 6 3 9
Performing calculation and adding column with the result:
exceptions = ['D','E']
df['total_score'] = np.where(~df.loc[:,'code_1':'code_4']
.isin(exceptions),df.loc[:,'score_1':'score_4'], 0).sum(axis=1)
df.head()
code_1 code_2 code_3 code_4 score_1 score_2 score_3 score_4 total_score
0 D C D D 7 7 6 2 7 # summing score_2 ('C')
1 D A C C 9 7 4 9 20 # summing score_2, 3, 4 ('A', 'C', 'C')
2 B D E A 2 3 3 9 11
3 D A A B 6 1 6 2 9
4 A A A E 7 6 3 9 16