I have a huge dataframe with ~100,000 rows. I have this code but it is taking way too long to execute. Is there any way to make this more efficient?
df["Grade Band"] = ""
k5 = ["0","1","2","3","4","5"]
ms = ["6", "7" ,"8"]
hs = ["9","10","11","12"]
for x in df["Grade Roll"]:
if x == "Other":
df["Grade Band"] == "Undefined"
elif x in k5:
df["Grade Band"] == "K5"
elif x in ms:
df["Grade Band"] == "MS"
elif x in hs:
df["Grade Band"] == "HS"
CodePudding user response:
This should be pretty fast:
df.loc[df["Grade Roll"] == "Other", "Grade Roll"] = "Undefined"
df.loc[df["Grade Roll"].isin(k5), "Grade Roll"] = "K5"
df.loc[df["Grade Roll"].isin(ms), "Grade Roll"] = "MS"
df.loc[df["Grade Roll"].isin(hs), "Grade Roll"] = "HS"
If you wanted it to be less repetitive, you could store your arrays in a dict
:
d = {
"K5": ["0","1","2","3","4","5"],
"MS": ["6", "7" ,"8"],
"HS": ["9","10","11","12"],
"Undefined": ["Other"]
}
for k, v in d.items():
df.loc[df["Grade Roll"].isin(v), "Grade Roll"] = k
CodePudding user response:
Use a map.
gradeMap = { "Other": "Undefined" }
gradeMap.update(dict.fromkeys(k5, "K5"))
gradeMap.update(dict.fromkeys(ms, "MS"))
gradeMap.update(dict.fromkeys(hs, "HS"))
for x in df["Grade Roll"]:
if x in gradeMap:
gradeMapVal = gradeMap[x]
else:
raise Exception('The grade ' x ' was not found in the grade map.')