Home > front end >  is there a library to simplify this calculation? or other simple way
is there a library to simplify this calculation? or other simple way

Time:06-19

I'm new to python, specially about data.

Ask permission to ask this question.

i'm try to learn to calculate column with another column. Like the following example:

The Data Frame:

ID     Name  Course1  Course2  Course3  Score_course1  Score_Course2  Score_Course3
1       A      4        3.75      3           2              3              3
2       B      2.75      4        4           2              3              3

The code i use:

df['total'] = ((df['Course1'] * df['Score_Course1'])   (df['Course2'] * df['Score_Course2'])   (df['Course3'] * df['Score_Course3']))

but this code is too long. Can I use "for"?.

Or any other suggestions?

CodePudding user response:

I guess the example you provided is a truncated version and you have many other couples [course-score], or even a non constant number of couples. Am I right? If so, I think the best approach woudld be to pivot your Dataframe to a tidyer format like this:

ID Name Course_number Course_value Course_score
1 A 1 4 3.75
1 A 2 3 2
1 A 3 3 3
2 B 1 2.75 4
2 B 2 4 2
2 B 3 3 3

See pandas.melt function to reshape your Dataframe like this.

With this new shape you can then:

  1. Store the the product of Course_value * Course_score into a dedicated column (ie: "product"). See pandas.assign
  2. Group by Name. See pandas.grouby
  3. Summarise your groups by calculating the group-wise sum of the product column

It will produce a Dataframe like this:

ID Name Total
1 A 24.3
2 B 28

CodePudding user response:

Given:

   ID Name  Course1  Course2  Course3  Score_Course1  Score_Course2  Score_Course3
0   1    A     4.00     3.75        3              2              3              3
1   2    B     2.75     4.00        4              2              3              3

Doing:

df2 = pd.wide_to_long(df, i=['ID', 'Name'], stubnames=['Course', 'Score_Course'], j='Course_Num').reset_index()
print(df2)

Output:

   ID Name  Course_Num  Course  Score_Course
0   1    A           1    4.00             2
1   1    A           2    3.75             3
2   1    A           3    3.00             3
3   2    B           1    2.75             2
4   2    B           2    4.00             3
5   2    B           3    4.00             3

Now we can do the desired calculation easily:

# In one go:
df2.groupby(['ID', 'Name']).apply(lambda x: sum(x.Course * x.Score_Course)).reset_index(name='scores')

# OR

# In two steps:
df2['scores'] = df2.Course.mul(df2.Score_Course)
df2.groupby(['ID', 'Name'], as_index=False)[['scores']].sum()

Output:

   ID Name  scores
0   1    A   28.25
1   2    B   29.50

df2 after two-steps version:

   ID Name  Course_Num  Course  Score_Course  scores
0   1    A           1    4.00             2    8.00
1   1    A           2    3.75             3   11.25
2   1    A           3    3.00             3    9.00
3   2    B           1    2.75             2    5.50
4   2    B           2    4.00             3   12.00
5   2    B           3    4.00             3   12.00
  • Related