Home > Back-end >  how I can extract the list of skills from job skill column?
how I can extract the list of skills from job skill column?

Time:11-07

I'm trying to extract every skill from job_skills to be attribute and encoding it by zero or one , how i can do that ?

note : im trying to create a data frame but its not worth to fill the data frame manually (the code is below) , im search for method to extract a list from the column . i need to apply ML algorithms on this data

data = [['a', ['Python', 'UI',' Information Technology (IT)','Software Development','GTK','English',' Software Engineering']],
        ['b', ['Python', 'Relational Databases',' Celery',' VMWare','Django','Continous Integration',' Test Driven Development',' HTTP']],
        ['c', ['Flask', 'Python',' Celery',' Software Development',' Computer Science','Information Technology (IT)']],
        ['c', ['Flask', 'Python',' Celery',' Software Development',' Computer Science','Information Technology (IT)']]
        
        
        ]
df1= pd.DataFrame(data, columns=['col1', 'col2'])

pd.get_dummies(df1['col2'].explode()).groupby(level=0).sum()



CodePudding user response:

I can't think of anything out of the pandas box that will do this straight off. If I understand you want one hot variables for each skill for each person (row). Have you got a unique identifier for each job. If not you need one. In the example below I use the row.

skills = []

row = []



for index, row in df.iterrows():
     for item in row['jobs_skills']:
           row.append(row)
           skills.append(item)

df = pd.DataFrame({'row': row, 'skills': skills})
 

Once you have df you can follow the same logic here:

enter image description here

# Input used:

print(df.to_string())

    job_title    company    location                                                                                             job_skills
0   Python Or    ItsTime   Oakville,          ['Python', 'UI', 'Computer Science', '. Information Technology (IT)', 'Software Development']
1  Senior Pyt   CLOUDSIG  Sofia, Bul         ['Python3', 'Relational Databases', '. Celery', 'VMWare', '. Django',' Continous Integration']
2  Flask Pyth  Cyber sec  Cairo, Egy  ['Flask', 'Python', '. Software Development', '. Computer Science', '. Information Technology (IT)']
  • Related