I have this Pandas DataFrame
I'm attempting to create a new column named Needed using the code below. The rule is:
In case of "KHOÁ NHÓM", for EVERY 25 giohoc, Needed = dauvao_overall 0.5.
In case of "KHOÁ KÈM", for EVERY 20 giohoc, Needed = dauvao_overall 0.5.
My idea is to divide giohoc by 25 for "KHOÁ NHÓM" and 20 for "KHOÁ KÈM".
If the result < 1 then Needed = dauvao_overall.
If the result >=1 and <2 then Needed = dauvao_overall 0.5.
If the result >=2 and <3 then Needed = dauvao_overall 1.
All the way up to .... Needed = dauvao_overall 7.
Although I succeeded, I believe there is a shorter and cleaner way to achieve the same result. Please tell me what I can do to improve the code. Thank you!
empty =[]
for index, row in didiem.iterrows():
# KHOÁ NHÓM
if row.group_kh_ten == "KHOÁ NHÓM" and row.giohoc/25 < 1:
empty.append(row.dauvao_overall)
elif row.group_kh_ten == "KHOÁ NHÓM" and row.giohoc/25 >= 1 and row.giohoc/25 <2:
empty.append(row.dauvao_overall 0.5)
elif row.group_kh_ten == "KHOÁ NHÓM" and row.giohoc/25 >= 2 and row.giohoc/25 <3:
empty.append(row.dauvao_overall 1)
elif row.group_kh_ten == "KHOÁ NHÓM" and row.giohoc/25 >= 3 and row.giohoc/25 <4:
empty.append(row.dauvao_overall 1.5)
elif row.group_kh_ten == "KHOÁ NHÓM" and row.giohoc/25 >= 4 and row.giohoc/25 <5:
empty.append(row.dauvao_overall 2)
elif row.group_kh_ten == "KHOÁ NHÓM" and row.giohoc/25 >= 5 and row.giohoc/25 <6:
empty.append(row.dauvao_overall 2.5)
elif row.group_kh_ten == "KHOÁ NHÓM" and row.giohoc/25 >= 6 and row.giohoc/25 <7:
empty.append(row.dauvao_overall 3)
elif row.group_kh_ten == "KHOÁ NHÓM" and row.giohoc/25 >= 7 and row.giohoc/25 <8:
empty.append(row.dauvao_overall 3.5)
elif row.group_kh_ten == "KHOÁ NHÓM" and row.giohoc/25 >= 8 and row.giohoc/25 <9:
empty.append(row.dauvao_overall 4.0)
elif row.group_kh_ten == "KHOÁ NHÓM" and row.giohoc/25 >= 9 and row.giohoc/25 <10:
empty.append(row.dauvao_overall 4.5)
elif row.group_kh_ten == "KHOÁ NHÓM" and row.giohoc/25 >= 10 and row.giohoc/25 <11:
empty.append(row.dauvao_overall 5)
elif row.group_kh_ten == "KHOÁ NHÓM" and row.giohoc/20 >= 14 and row.giohoc/20 <15:
empty.append(row.dauvao_overall 7.0)
# KHOÁ KÈM
elif row.group_kh_ten == "KHOÁ KÈM" and row.giohoc/20 < 1:
empty.append(row.dauvao_overall)
elif row.group_kh_ten == "KHOÁ KÈM" and row.giohoc/20 >= 1 and row.giohoc/20 <2:
empty.append(row.dauvao_overall 0.5)
elif row.group_kh_ten == "KHOÁ KÈM" and row.giohoc/20 >= 2 and row.giohoc/20 <3:
empty.append(row.dauvao_overall 1)
elif row.group_kh_ten == "KHOÁ KÈM" and row.giohoc/20 >= 3 and row.giohoc/20 <4:
empty.append(row.dauvao_overall 1.5)
elif row.group_kh_ten == "KHOÁ KÈM" and row.giohoc/20 >= 4 and row.giohoc/20 <5:
empty.append(row.dauvao_overall 2)
elif row.group_kh_ten == "KHOÁ KÈM" and row.giohoc/20 >= 5 and row.giohoc/20 <6:
empty.append(row.dauvao_overall 2.5)
elif row.group_kh_ten == "KHOÁ KÈM" and row.giohoc/20 >= 6 and row.giohoc/20 <7:
empty.append(row.dauvao_overall 3)
elif row.group_kh_ten == "KHOÁ KÈM" and row.giohoc/20 >= 7 and row.giohoc/20 <8:
empty.append(row.dauvao_overall 3.5)
elif row.group_kh_ten == "KHOÁ KÈM" and row.giohoc/20 >= 8 and row.giohoc/20 <9:
empty.append(row.dauvao_overall 4.0)
elif row.group_kh_ten == "KHOÁ KÈM" and row.giohoc/20 >= 9 and row.giohoc/20 <10:
empty.append(row.dauvao_overall 4.5)
elif row.group_kh_ten == "KHOÁ KÈM" and row.giohoc/20 >= 10 and row.giohoc/20 <11:
empty.append(row.dauvao_overall 5.0)
elif row.group_kh_ten == "KHOÁ KÈM" and row.giohoc/20 >= 11 and row.giohoc/20 <12:
empty.append(row.dauvao_overall 5.5)
elif row.group_kh_ten == "KHOÁ KÈM" and row.giohoc/20 >= 12 and row.giohoc/20 <13:
empty.append(row.dauvao_overall 6.0)
elif row.group_kh_ten == "KHOÁ KÈM" and row.giohoc/20 >= 13 and row.giohoc/20 <14:
empty.append(row.dauvao_overall 6.5)
elif row.group_kh_ten == "KHOÁ KÈM" and row.giohoc/20 >= 14 and row.giohoc/20 <15:
empty.append(row.dauvao_overall 7.0)
elif row.group_kh_ten == "KHOÁ KÈM" and row.giohoc/20 >= 15 and row.giohoc/20 <16:
empty.append(row.dauvao_overall 7.5)
else:
empty.append("inspect")
didiem["Needed"] = empty
CodePudding user response:
I think this will do what you want (I only solved it for one of your cases...)
import numpy
import pandas
num_rows = 1000
# some random values between 2 and 10 for this column
dauvao_overall = numpy.random.uniform(2,10,num_rows)
# some random values between 1 and 200 for this column
giohoc = numpy.random.randint(1,200,num_rows)
# some random values for this column
group_kh_ten = numpy.random.choice(["KHOA NHOM","KHOA KEM"],num_rows)
#make a dataframe
df = pandas.DataFrame({"dauvao_overall":dauvao_overall,"giohoc":giohoc, "group_kh_ten":group_kh_ten})
df['needed'] = 0
# here is how you would solve KHOA KEM
khoa_kem = df['group_kh_ten']=='KHOA KEM'
df.loc[khoa_kem,"needed"] = (df[khoa_kem]['dauvao_overall'] 0.5) * (df[khoa_kem]['giohoc']//25)
print(df)
CodePudding user response:
First, define a function which will calculate the Needed
value. It will receive a dataframe row, and do the calculations.
def fun(row):
group_kh, overall, giohoc = [row[col_name]
for col_name in ['group_kh_ten', 'dauvao_overall', 'giohoc']]
match group_kh:
case 'KHOÁ NHÓM':
needed = overall (giohoc // 25) * 0.5
case 'KHOÁ KÈM' :
needed = overall (giohoc // 20) * 0.5
if giohoc // 20 >= 16: needed = 'inspect'
case _ :
print("error: wrong group_kh_ten")
return needed
Apply the function on each row of the dataframe:
df['Needed'] = df.apply(fun, axis=1)
Example:
group_kh_ten dauvao_overall giohoc
0 KHOÁ NHÓM 2.0 70.0
1 KHOÁ KÈM 3.5 80.0
Apply the function fun
:
df['Needed'] = df.apply(fun, axis=1)
Output:
group_kh_ten dauvao_overall giohoc Needed
0 KHOÁ NHÓM 2.0 70.0 3.0
1 KHOÁ KÈM 3.5 80.0 5.5