Creating a Function to Count Unique Values Based on Another Column-CodePudding

I have data that look like this.

company_name	new_company_status
A Co.,Ltd	Yes
B. Inc	No
PT XYZ	No
PT DFE, Tbk.	Yes
A Co.,Ltd	Yes
PT DFE, Tbk.	Yes

I want to create a function in python to check every unique company name from 'company_name' column and compare the 'new_company_status', if the 'new_company_status' is "Yes" for every unique company name, it will count as 1 and iterate to get the total number of new company.

So far this is the code that I write: `

def new_comp(DataFrame):
    comp_list = df['Company_Name'].values.tolist
    uniq_comp = set(comp_list)
    for x in uniq_comp:
        if df['Status_New_Company'] == "Yes":
            uniq_comp  = 1
    print('New Companies: ', uniq_comp)

Can anyone help me to complete and/or revise the code? I expect the output is integer to define the total of new company. Thank u in advance.

CodePudding user response：

You can use masks and boolean addition to count the matches:

# keep one company of each
m1 = ~df['company_name'].duplicated()
# is this a yes?
m2 = df['new_company_status'].eq('Yes')

# count cases for which both conditions are True
out = (m1&m2).sum()

Output: 2

If a given company can have both Yes and No and you want to count 1 if there is at least one Yes, you can use a groupby.any:

out = (df['new_company_status']
  .eq('Yes')
 .groupby(df['company_name']).any()
 .sum()
)

Output: 2

CodePudding user response：

If need total unique values of company_name if new_company_status match Yes filter and count length of sets:

N = len(set(df.loc[df['new_company_status'].eq('Yes'), 'company_name']))

If need count number of Yes per company_name to new DataFrame aggregate boolean mask by sum:

df1 = (df['new_company_status'].eq('Yes')
                .groupby(df['company_name'])
                .sum()
                .reset_index(name='countYes'))