I have a for loop that generates a new value from one DF column, but how do I replace another value-CodePudding

Here's an example of my DataFrame

               CVE ID  Vulnerability ID  Severity          Fix Status  CVSS
0        ALPINE-13661                46       low  fixed in 1.32.1-r8   0.0
1       CVE-2012-5784                47  moderate                open   4.0
2       CVE-2013-0169               411       low                None   2.6
3       CVE-2014-0429               411  critical                None  10.0
4       CVE-2014-0432               411  critical                None   9.3
..                ...               ...       ...                 ...   ...
622  PRISMA-2022-0049                49      high      fixed in 2.0.1   8.0
623  PRISMA-2022-0168               410      high                open   7.8
624  PRISMA-2022-0227               416      high                open   7.5
625  PRISMA-2022-0239                47      high      fixed in 4.9.2   7.5
626  PRISMA-2022-0270               416    medium                open   5.4

Currently I have a for-loop that loops through the CVSS column and generates a new 'Severity' value, called s (The new value will be "Low", "Moderate", or "High"). How do I replace the old value in the 'Severity' column, with my new value of s?

Main.py

def main():
    dataframe = csv_to_df()
    severity_levels(dataframe)

def csv_to_df():
    input_csv = pd.read_csv(f"{input_csv_filename}.csv")

    unique_df = input_csv.drop_duplicates(subset=["CVE ID", "ID"]).groupby("CVE ID", as_index=False).agg(dict.fromkeys(input_csv.columns, "first") | {"ID": ", ".join})

    df = unique_df[['CVE ID', 'Vulnerability ID', 'Severity', 'Fix Status', 'CVSS']] 
    return df

def severity_levels(df):
    for cvssv3 in df[['CVSS']].values:
        cvss = float(cvssv3)
        if cvss < 4.0:
            s = "Low"
        elif cvss >= 4 and cvss < 7:
            s = "Moderate"
        else:
            s = "High"

CodePudding user response：

Avoid loops in pandas. Use vectorized functions if you can:

def main():
    dataframe = csv_to_df()
    df["Severity"] = pd.cut(df["CVSS"], [-np.inf, 4, 7, np.inf], labels=["Low", "Moderate", "High"])

pd.cut will assign labels based on your bin ranges:

[-np.inf, 4) -> Low
[4, 7)       -> Moderate
[7, np.inf)  -> High