I am new to programming and Python. Lately, I am learning to use pandas.
What I would like to know
I am wondering what would be the best approach to work only on numbers related to Group II (in the attached DataFrame). I mean e.g. sum all grades for group II and column 'Project'. Sure it won't make sense to sum grades, but the data is just for illustration purposes.
I'd be grateful for any advices and suggestions.
My DataFrame
The code attached will generate random numbers (except for the 'Group' column) but the DataFrame will always be like that:
Name Album Group Colloquium_1 Colloquium_2 Project
# 0 B 61738 I 5 4 5
# 1 Z 44071 I 5 5 2
# 2 M 87060 I 5 5 5
# 3 L 67974 I 3 5 3
# 4 Z 15617 I 3 2 3
# 5 Z 91872 II 2 4 5
# 6 H 84685 II 4 2 5
# 7 T 17943 II 2 5 2
# 8 L 54302 II 2 5 3
# 9 O 53433 II 5 4 5
Code to generate my DataFrame:
import pandas as pd
import random as rd
def gen_num():
num = ""
for i in range(5):
num = str(rd.randint(0,9))
return num
names = ['A','B','C','D','E','F','G','H','I','J','K', 'L','M','N','O', \
'P','R','S','T','W','Z']
list_names = []
list_album = []
list_group = []
list_coll_1 = []
list_coll_2 = []
list_project = []
num_of_students = 10
for i in range(num_of_students):
list_names.append(rd.choice(names))
list_album.append(gen_num())
list_coll_1.append(rd.randint(2, 5))
list_coll_2.append(rd.randint(2, 5))
list_project.append(rd.randint(2, 5))
if i < (num_of_students / 2):
list_group.append('I')
else:
list_group.append('II')
group = pd.DataFrame(list_names)
group.set_axis(['Name'], axis=1, inplace=True)
group['Album'] = list_album
group['Group'] = list_group
group['Colloquium_1'] = list_coll_1
group['Colloquium_2'] = list_coll_2
group['Project'] = list_project
CodePudding user response:
One solution to this is to filter the DataFrame first:
group[group["Group"] == "II"]["Project"].sum()
#Out: 18
Breaking this up into parts:
First, this part returns a series of bools (True/False) for each row as to whether the values in "Group" are equal to "II"
:
group["Group"] == "II"
#0 False
#1 False
#2 False
#3 False
#4 False
#5 True
#6 True
#7 True
#8 True
#9 True
#Name: Group, dtype: bool
Next, writing this into group[]
returns a filtered dataframe for those rows that are True
:
group[group["Group"] == "II"]
# Name Album Group Colloquium_1 Colloquium_2 Project
#5 E 77371 II 4 5 3
#6 N 90525 II 4 3 3
#7 H 89889 II 3 4 5
#8 T 88154 II 3 4 5
#9 E 56176 II 3 2 2
Using ["Project"]
on the end returns a pandas Series of the values in the column:
group[group["Group"] == "II"]["Project"]
#5 3
#6 3
#7 5
#8 5
#9 2
#Name: Project, dtype: int64
And lastly .sum()
returns the sum of the series (18).
CodePudding user response:
You can use the DataFrame.groupby
function to analyze data from one or many columns based on "groups" defined in other columns.
For example, something like
group.groupby('Group')['Project'].sum()
Or,you could use masking if you only want the result:
group[group['Group']=='II']['Project'].sum()