Here's an example of my DataFrame
CVE ID Vulnerability ID Severity Fix Status CVSS
0 ALPINE-13661 46 low fixed in 1.32.1-r8 0.0
1 CVE-2012-5784 47 moderate open 4.0
2 CVE-2013-0169 411 low None 2.6
3 CVE-2014-0429 411 critical None 10.0
4 CVE-2014-0432 411 critical None 9.3
.. ... ... ... ... ...
622 PRISMA-2022-0049 49 high fixed in 2.0.1 8.0
623 PRISMA-2022-0168 410 high open 7.8
624 PRISMA-2022-0227 416 high open 7.5
625 PRISMA-2022-0239 47 high fixed in 4.9.2 7.5
626 PRISMA-2022-0270 416 medium open 5.4
Currently I have a for-loop that loops through the CVSS column and generates a new 'Severity' value, called s (The new value will be "Low", "Moderate", or "High"). How do I replace the old value in the 'Severity' column, with my new value of s?
Main.py
def main():
dataframe = csv_to_df()
severity_levels(dataframe)
def csv_to_df():
input_csv = pd.read_csv(f"{input_csv_filename}.csv")
unique_df = input_csv.drop_duplicates(subset=["CVE ID", "ID"]).groupby("CVE ID", as_index=False).agg(dict.fromkeys(input_csv.columns, "first") | {"ID": ", ".join})
df = unique_df[['CVE ID', 'Vulnerability ID', 'Severity', 'Fix Status', 'CVSS']]
return df
def severity_levels(df):
for cvssv3 in df[['CVSS']].values:
cvss = float(cvssv3)
if cvss < 4.0:
s = "Low"
elif cvss >= 4 and cvss < 7:
s = "Moderate"
else:
s = "High"
CodePudding user response:
Avoid loops in pandas. Use vectorized functions if you can:
def main():
dataframe = csv_to_df()
df["Severity"] = pd.cut(df["CVSS"], [-np.inf, 4, 7, np.inf], labels=["Low", "Moderate", "High"])
pd.cut
will assign labels based on your bin ranges:
[-np.inf, 4) -> Low
[4, 7) -> Moderate
[7, np.inf) -> High