I'm new to python, specially about data.
Ask permission to ask this question.
i'm try to learn to calculate column with another column. Like the following example:
The Data Frame:
ID Name Course1 Course2 Course3 Score_course1 Score_Course2 Score_Course3
1 A 4 3.75 3 2 3 3
2 B 2.75 4 4 2 3 3
The code i use:
df['total'] = ((df['Course1'] * df['Score_Course1']) (df['Course2'] * df['Score_Course2']) (df['Course3'] * df['Score_Course3']))
but this code is too long. Can I use "for"?.
Or any other suggestions?
CodePudding user response:
I guess the example you provided is a truncated version and you have many other couples [course-score], or even a non constant number of couples. Am I right? If so, I think the best approach woudld be to pivot your Dataframe to a tidyer format like this:
ID | Name | Course_number | Course_value | Course_score |
---|---|---|---|---|
1 | A | 1 | 4 | 3.75 |
1 | A | 2 | 3 | 2 |
1 | A | 3 | 3 | 3 |
2 | B | 1 | 2.75 | 4 |
2 | B | 2 | 4 | 2 |
2 | B | 3 | 3 | 3 |
See pandas.melt function to reshape your Dataframe like this.
With this new shape you can then:
- Store the the product of
Course_value * Course_score
into a dedicated column (ie: "product"). See pandas.assign - Group by Name. See pandas.grouby
- Summarise your groups by calculating the group-wise sum of the product column
It will produce a Dataframe like this:
ID | Name | Total |
---|---|---|
1 | A | 24.3 |
2 | B | 28 |
CodePudding user response:
Given:
ID Name Course1 Course2 Course3 Score_Course1 Score_Course2 Score_Course3
0 1 A 4.00 3.75 3 2 3 3
1 2 B 2.75 4.00 4 2 3 3
Doing:
df2 = pd.wide_to_long(df, i=['ID', 'Name'], stubnames=['Course', 'Score_Course'], j='Course_Num').reset_index()
print(df2)
Output:
ID Name Course_Num Course Score_Course
0 1 A 1 4.00 2
1 1 A 2 3.75 3
2 1 A 3 3.00 3
3 2 B 1 2.75 2
4 2 B 2 4.00 3
5 2 B 3 4.00 3
Now we can do the desired calculation easily:
# In one go:
df2.groupby(['ID', 'Name']).apply(lambda x: sum(x.Course * x.Score_Course)).reset_index(name='scores')
# OR
# In two steps:
df2['scores'] = df2.Course.mul(df2.Score_Course)
df2.groupby(['ID', 'Name'], as_index=False)[['scores']].sum()
Output:
ID Name scores
0 1 A 28.25
1 2 B 29.50
df2
after two-steps version:
ID Name Course_Num Course Score_Course scores
0 1 A 1 4.00 2 8.00
1 1 A 2 3.75 3 11.25
2 1 A 3 3.00 3 9.00
3 2 B 1 2.75 2 5.50
4 2 B 2 4.00 3 12.00
5 2 B 3 4.00 3 12.00