Home > Back-end >  Python (If, Then) with DataFrames
Python (If, Then) with DataFrames

Time:10-28

I have a data frame with months (by year), and ID number. I am trying to calculate the attrition rate, but I am getting stuck on obtaining unique ID counts when a month equals a certain month in pandas.

ID. Month
1 Sept. 2022
2 Oct. 2022

etc... with possible duplicates in ID and 1.75 years worth of data.

import pandas as pd 

path = some path on my computer 
data = pd.read_excel(path)

if data["Month"] == "Sept. 2022":
   ID_SEPT = data["ID."].unique()
return ID_SEPT

I am trying to discover what I am doing incorrect here in this if-then statement. Ideally I am trying to collect all the unique ID values per each month per year to then calculate the attrition rate. Is there something obvious I am doing wrong here?

Thank you.

I tried an id-then statement and I was expecting unique value counts of ID per month.

CodePudding user response:

The way you do this with a dataframe, conceptually, is to filter the entire dataframe to be just the rows where your comparison is true, and then do whatever (get uniques) from there.

That would look like this:

filtered_df = df[df['Month'] == 'Sept. 2022']
ids_sept = list(filtered_df['ID.'].unique()

The first line there can look at a little strange, but what it is doing is:

  1. df['Month'] == 'Sept. 2022' will return an array/column/series (it actually returns a series) of True/False whether or not the comparison is, well, true or false.
  2. You then run that series of bools through df[series_of_bools] that filters the dataframe to return only the rows where it is True.

Thus, you have a filter.

If you are looking for the number of unique items, rather than the list of unique items, you can also use filtered_df['ID.'].nunique() and save yourself the step later of getting the length of the list.

CodePudding user response:

You are looking for pandas.groupby.

Use it like this to get the unique values of each Group (Month)

data.groupby("Month")["ID."].unique()  # You have a . after ID in your example, check if thats correct

CodePudding user response:

You need to use one of the iterator functions, like items().

for (columnName, columnData) in data.iteritems():
    if columnName = 'Month'
         [code]

CodePudding user response:

try this

data[data.Month=='Sept. 2022']['ID.'].unique()
  • Related