Home > Net >  Add a new column to a Pandas dataframe with a value from a function
Add a new column to a Pandas dataframe with a value from a function

Time:04-28

I know this is similar to other questions but I can't find a solution that I can make work.

I have a dataframe that contains grades that looks similar to this:

  subj1 subj2 subj3 subj4
0   A     B     A     B
1   B     B     C     B
2   C     C     B     A

I want to append a GPA score in a new column so that the result is this:

  subj1 subj2 subj3 subj4 GPA
0   A     B     A     B   3.5
1   B     B     C     B   2.8
2   C     D     B     A   2.5

the function I use to calculate the GPA is this:

def calcgpa():
    for row in df.itertuples(index=False):
        tot = 0
        c = 0
        GPA = 0
        for i in range(len(row)):
            if row[i] == "A":
                tot = tot   4
                c  = 1
            elif row[i] == "B":
                tot = tot   3
                c  = 1
            elif row[i] == "C":
                tot = tot   2
                c  = 1
            elif row[i] == "D":
                tot = tot   1
                c  = 1
            else:
                c  = 1
        GPA = tot / c
        return GPA

I thought that df["GPA"] = pd.Series(calcgpa()) would work but it only adds a value to the first row. All others are NaN. Trying to use pd.apply or pd.assign just gave me an AssertionError.

Is the problem with how the function returns the GPA or what is the proper syntax I need to add the new column?

CodePudding user response:

If you look at the output of calcgpa(), it is a single float: 3.5 not a list of GPAs, hence why your output only gives 1 value, then Nans.

I would suggest for your code you need to store each GPA value to a list, and assign that as the column instead. This requires some small changes to your code:

replacing GPA = 0 with GPA = [] to turn it into a list and moving this to the top of the function, outside of both for loops. Then change GPA = tot/c to GPA.append(tot / c) to append each GPA to the list to be assigned as the new GPA column.

Full code:

def calcgpa():
    GPA = []
    for row in df.itertuples(index=False):
        tot = 0
        c = 0
        for i in range(len(row)):
            if row[i] == "A":
                tot = tot   4
                c  = 1
            elif row[i] == "B":
                tot = tot   3
                c  = 1
            elif row[i] == "C":
                tot = tot   2
                c  = 1
            elif row[i] == "D":
                tot = tot   1
                c  = 1
            else:
                c  = 1
        GPA.append(tot / c)
    return GPA

You can then assign this to the GPA column like this:

df["GPA"] = calcgpa()

Output:

  subj1 subj2 subj3 subj4   GPA
0     A     B     A     B  3.50
1     B     B     C     B  2.75
2     C     C     B     A  2.75

As posted in the other answer, there are more efficient ways to achieve this, but as your code was close I thought I would amend that to achieve the result

CodePudding user response:

Assuming you only have A-E, if you have anything else, ensure you replace them wite zero first, you can then do:

df['GPA'] = df.replace({'A':4,'B':3,'C':2, 'D':1, 'E':0}).mean(1)

df 
  subj1 subj2 subj3 subj4   GPA
0     A     B     A     B  3.50
1     B     B     C     B  2.75
2     C     C     B     A  2.75
  • Related