I have data that look like this.
company_name | new_company_status |
---|---|
A Co.,Ltd | Yes |
B. Inc | No |
PT XYZ | No |
PT DFE, Tbk. | Yes |
A Co.,Ltd | Yes |
PT DFE, Tbk. | Yes |
I want to create a function in python to check every unique company name from 'company_name' column and compare the 'new_company_status', if the 'new_company_status' is "Yes" for every unique company name, it will count as 1 and iterate to get the total number of new company.
So far this is the code that I write: `
def new_comp(DataFrame):
comp_list = df['Company_Name'].values.tolist
uniq_comp = set(comp_list)
for x in uniq_comp:
if df['Status_New_Company'] == "Yes":
uniq_comp = 1
print('New Companies: ', uniq_comp)
`
Can anyone help me to complete and/or revise the code? I expect the output is integer to define the total of new company. Thank u in advance.
CodePudding user response:
You can use masks and boolean addition to count the matches:
# keep one company of each
m1 = ~df['company_name'].duplicated()
# is this a yes?
m2 = df['new_company_status'].eq('Yes')
# count cases for which both conditions are True
out = (m1&m2).sum()
Output: 2
If a given company can have both Yes and No and you want to count 1 if there is at least one Yes, you can use a groupby.any
:
out = (df['new_company_status']
.eq('Yes')
.groupby(df['company_name']).any()
.sum()
)
Output: 2
CodePudding user response:
If need total unique values of company_name
if new_company_status
match Yes
filter and count length of sets:
N = len(set(df.loc[df['new_company_status'].eq('Yes'), 'company_name']))
If need count number of Yes
per company_name
to new DataFrame aggregate boolean mask by sum
:
df1 = (df['new_company_status'].eq('Yes')
.groupby(df['company_name'])
.sum()
.reset_index(name='countYes'))