Python (If, Then) with DataFrames-CodePudding

I have a data frame with months (by year), and ID number. I am trying to calculate the attrition rate, but I am getting stuck on obtaining unique ID counts when a month equals a certain month in pandas.

ID.	Month
1	Sept. 2022
2	Oct. 2022

etc... with possible duplicates in ID and 1.75 years worth of data.

import pandas as pd 

path = some path on my computer 
data = pd.read_excel(path)

if data["Month"] == "Sept. 2022":
   ID_SEPT = data["ID."].unique()
return ID_SEPT

I am trying to discover what I am doing incorrect here in this if-then statement. Ideally I am trying to collect all the unique ID values per each month per year to then calculate the attrition rate. Is there something obvious I am doing wrong here?

Thank you.

I tried an id-then statement and I was expecting unique value counts of ID per month.

CodePudding user response：

The way you do this with a dataframe, conceptually, is to filter the entire dataframe to be just the rows where your comparison is true, and then do whatever (get uniques) from there.

That would look like this:

filtered_df = df[df['Month'] == 'Sept. 2022']
ids_sept = list(filtered_df['ID.'].unique()

The first line there can look at a little strange, but what it is doing is:

df['Month'] == 'Sept. 2022' will return an array/column/series (it actually returns a series) of True/False whether or not the comparison is, well, true or false.
You then run that series of bools through df[series_of_bools] that filters the dataframe to return only the rows where it is True.

Thus, you have a filter.

If you are looking for the number of unique items, rather than the list of unique items, you can also use filtered_df['ID.'].nunique() and save yourself the step later of getting the length of the list.

CodePudding user response：

You are looking for pandas.groupby.

Use it like this to get the unique values of each Group (Month)

data.groupby("Month")["ID."].unique()  # You have a . after ID in your example, check if thats correct

CodePudding user response：

You need to use one of the iterator functions, like items().

for (columnName, columnData) in data.iteritems():
    if columnName = 'Month'
         [code]

CodePudding user response：

try this

data[data.Month=='Sept. 2022']['ID.'].unique()