Calculate the percentage by reading values between two specific columns and add percentage column-CodePudding

I have dataframe like this

Keyword                   devops       aws       ec2   python     java    dotnet
optional/mandatory        optional mandatory mandatory optional optional mandatory
Candidate Name                                                         
User1                       1          1          1        1        1         1
User2                       1          0          0        1        1         1
...                        ...        ...        ...      ...      ...       ...

User67                      1          1          1        0        0         0

I want to calculate percentage match for each Candidate Name and add new Percentage match column also the row optional/mandatory play important role in calculating percentage. Like the keyword aws ec2 dotnet are mandatory if there count is 0 then set the percentage match for that Candidate to 0%. Means all mandatory keywords must have count 1 then only calculate the percentage of that cancidate.

Expected Output:

Keyword                   devops       aws       ec2   python     java    dotnet    Percentage
optional/mandatory        optional mandatory mandatory optional optional mandatory    Match
Candidate Name                                                         
User1                       1          1          1        1        1         1        100%
User2                       1          1          1        0        0         1        66.66%
...                        ...        ...        ...      ...      ...       ...

User67                      1          1          1        0        0         0         0%

for calculate percentage:

User1 is 100% as all mandatory have count 1

User2 is 66.66% as two optional have 0 count then Total Number of ons's/Total Keywords*100 (4/6)100

User3 is 0% as out of 3 mandatory one have zero count to directly the percentage is set to 0% irrespective of the optional keyword count

Keyword row and Candidate Name column keeps changing there can be more keywords and Candidate names in future. How to Implement the desire output.

My Code:

import numpy as np
import pandas as pd
read_csv = pd.read_csv('Sample.csv')
df = pd.DataFrame(read_csv)
df["Count"] = np.where(df["Count"] > 0, 1, 0)
print(df)
df = df.pivot_table(index="Candidate Name", columns=["optional/mandatory","Keyword"], values="Count")
print(df)

Output:

Candidate Name    optional/mandatory Keyword  Count
      User1       optional           devops      1
      User1      mandatory              aws      0
      User1      mandatory              ec2      1
      User1       optional           python      1
      User1       optional             java      1
      User1      mandatory           dotnet      0
      User2       optional           devops      1
      User2      mandatory              aws      1
      User2      mandatory              ec2      0
      User3       optional           devops      1
      User3      mandatory              ec2      1
      User3      mandatory              aws      0
      User3       optional             java      1

optional/mandatory        mandatory             optional            
Keyword              aws dotnet  ec2   devops java python
Candidate Name                                           
    User1            0.0    0.0  1.0      1.0  1.0    1.0
    User2            1.0    NaN  0.0      1.0  NaN    NaN
    User3            0.0    NaN  1.0      1.0  1.0    NaN

CodePudding user response：

I took the liberty of reformatting your question a bit. Here's the setup:

import pandas as pd

# Set up your indices
cols = pd.MultiIndex.from_arrays([
    ['optional', 'mandatory', 'mandatory', 'mandatory', 'optional', 'optional'],
    ['devops', 'aws', 'ec2', 'dotnet', 'python', 'java']
])

# Set up the data however make sense, 
# it's nice to keep name and values together
data = [
    ['User 1', 1,1,1,1,1,1],
    ['User 2',1,0,0,1,1,1],
    # ...,
    ['User N',0,1,1,1,0,0]
]

# We can un-pack the name (index) and values with 
# slicing and list comprehensions
df = pd.DataFrame(
    columns=cols,
    data=[applicant[1:] for applicant in data],
    index=pd.Index([applicant[0] for applicant in data], name='Candidate Name')
)

# This sorts by index hirecery so you'll get all the 
# mandatory items then all the optional items
df.sort_index(axis='columns', inplace=True)

And then, we can use apply to run a function against each of the rows:

# You could use a lambda here but for readability 
#     we can also spell it out with a named function
# first we'll check if we have all mandatory items 
#     satisfied and then decide if it's a 0-coverage 
#     or percentage-coverage
def calculate_skill_coverage_percent(r):
    if sum(r['mandatory']) < len(r['mandatory']):
        return 0
    else:
        return sum(r) / len(r)
df['Percent_Match'] = df.apply(calculate_skill_coverage_percent, axis='columns')

print(df)

produces

               mandatory            optional             Percent_Match
                     aws dotnet ec2   devops java python              
Candidate Name                                                        
User 1                 1      1   1        1    1      1           1.0
User 2                 0      1   0        1    1      1           0.0
...
User N                 1      1   1        0    0      0           0.5

CodePudding user response：

Creation of dataframe df

data={'Names': [ '','User1', 'User2','User67','User9080'],
    'devops': ['optional',1,1,1,1],
    'aws' : ['mandatory',1,0,1,1],
    'ec2' : ['mandatory',1,0,1,1],
    'python': ['optional',1,1,0,0],
    'java' : ['optional',1,1,0,0],
    'dotnet' : ['mandatory',1,1,0,1]}

df=pd.DataFrame(data)

df



# **Out:**

   Names    devops  aws       ec2       python   java     dotnet
0          optional mandatory mandatory optional optional mandatory
1   User1  1        1         1         1        1        1
2   User2  1        0         0         1        1        1
3   User67 1        1         1         0        0        0
4   User90 1        1         1         0        0        1

Data preparation:

# get as row col names
import numpy as np
df=pd.DataFrame(np.vstack([df.columns, df]))
# get as column mandatorys row
df.columns= df.iloc[1]
#elimination as row (duplicated)
df=df.drop(1)

# set "users" column as index for rows (disturbs later if not)
df.set_index('', inplace=True)

# saving and wiping old column names (python,aws,etc) disturbs later
names=df.iloc[0].copy()
df=df.drop('Names')
df


# ***Out***

1   optional mandatory mandatory optional optional mandatory
User1   1        1         1        1        1        1
User2   1        0         0        1        1        1
User67  1        1         1        0        0        0
User90  1        1         1        0        0        1

# lenght of items (to calculate %)
items=len(df.columns)

# Function
def condiciones(df):
if df['mandatory'].all() == 1  :
    return (df.sum(axis=0) / items) *100
else:
    return 0



#apply
df['ya']=df.apply(condiciones,axis=1)
df




# *** Out:

1     optional mandatory    mandatory optional  optional mandatory  ya
User1   1      1            1          1         1       1   100.000000
User2   1      0            0          1         1       1    0.000000
User67  1      1            1          0         0       0    0.000000
User90  1      1            1          0         0       1    66.666667

Reinserting

# rescuing users from index
df['Users'] = df.index
# Rescuing wiped row (aws,python...)
names.to_list()   # add 'SCORE' at the end, is one more row
                  # add 'Users' at the end, as index is now row

df = (df.T.reset_index().T.reset_index(drop=True)
        .set_axis(['devops', 'aws', 'ec2', 'python', 'java', 
        'dotnet',"SCORE","Users"], axis=1))

# Moving Users to first
cols = list(df.columns)
cols = [cols[-1]]   cols[:-1]
df = df[cols]

Result

Out:




    Users   devops  aws      ec2        python   java     dotnet    SCORE
 0  Users optional mandatory mandatory  optional optional mandatory ya
 1  User1   1       1        1          1        1        1        100.0
 2  User2   1       0        0          1        1        1        0.0
 3  User67  1       1        1          0        0        0        0.0
 4  User90  1       1        1          0        0        1        66.67