I am new to programming and Python. Lately, I am learning to use pandas.

What I would like to know

I am wondering what would be the best approach to work only on numbers related to Group II (in the attached DataFrame). I mean e.g. sum all grades for group II and column 'Project'. Sure it won't make sense to sum grades, but the data is just for illustration purposes.

I'd be grateful for any advices and suggestions.

My DataFrame

The code attached will generate random numbers (except for the 'Group' column) but the DataFrame will always be like that:

     Name Album  Group  Colloquium_1  Colloquium_2  Project
# 0    B  61738     I             5             4        5
# 1    Z  44071     I             5             5        2
# 2    M  87060     I             5             5        5
# 3    L  67974     I             3             5        3
# 4    Z  15617     I             3             2        3
# 5    Z  91872    II             2             4        5
# 6    H  84685    II             4             2        5
# 7    T  17943    II             2             5        2
# 8    L  54302    II             2             5        3
# 9    O  53433    II             5             4        5

Code to generate my DataFrame:

import pandas as pd
import random as rd

def gen_num():
    num = ""
    for i in range(5):
        num  = str(rd.randint(0,9))
    return num

names = ['A','B','C','D','E','F','G','H','I','J','K', 'L','M','N','O', \
             'P','R','S','T','W','Z']

list_names = []
list_album = []
list_group = []
list_coll_1 = []
list_coll_2 = []
list_project = []  

num_of_students = 10

for i in range(num_of_students):
    
    list_names.append(rd.choice(names))
    list_album.append(gen_num())
    list_coll_1.append(rd.randint(2, 5))
    list_coll_2.append(rd.randint(2, 5))
    list_project.append(rd.randint(2, 5)) 
    if i < (num_of_students / 2):
        list_group.append('I')
    else:
        list_group.append('II')
        
group = pd.DataFrame(list_names)
group.set_axis(['Name'], axis=1, inplace=True)
group['Album'] = list_album
group['Group'] = list_group
group['Colloquium_1'] = list_coll_1
group['Colloquium_2'] = list_coll_2
group['Project'] = list_project

CodePudding user response：

One solution to this is to filter the DataFrame first:

group[group["Group"] == "II"]["Project"].sum()
#Out: 18

Breaking this up into parts:

First, this part returns a series of bools (True/False) for each row as to whether the values in "Group" are equal to "II":

group["Group"] == "II"
#0    False
#1    False
#2    False
#3    False
#4    False
#5     True
#6     True
#7     True
#8     True
#9     True
#Name: Group, dtype: bool

Next, writing this into group[] returns a filtered dataframe for those rows that are True:

group[group["Group"] == "II"]
#  Name  Album Group  Colloquium_1  Colloquium_2  Project
#5    E  77371    II             4             5        3
#6    N  90525    II             4             3        3
#7    H  89889    II             3             4        5
#8    T  88154    II             3             4        5
#9    E  56176    II             3             2        2

Using ["Project"] on the end returns a pandas Series of the values in the column:

group[group["Group"] == "II"]["Project"]
#5    3
#6    3
#7    5
#8    5
#9    2
#Name: Project, dtype: int64

And lastly .sum() returns the sum of the series (18).

CodePudding user response：

You can use the DataFrame.groupby function to analyze data from one or many columns based on "groups" defined in other columns.

For example, something like

group.groupby('Group')['Project'].sum()

Or,you could use masking if you only want the result:

group[group['Group']=='II']['Project'].sum()