Home > OS >  How to find a row content from a Pandas df into another df and then add the count of the findings in
How to find a row content from a Pandas df into another df and then add the count of the findings in

Time:07-31

I just started using Python and I am trying to improve myself by engaging in different projects.

At the moment I am trying to create a work allocation tool. I loaded 2 xlsx files as df.

In df "dfau" there is a list of the employees with their capacity left and the languages that they speak.

Employees Capacity Left Languages
E1 0 EN, GER
E2 -11 EN, IT, ES
E3 6 EN, CH
E4 2 EN, ES, FR
E5 -1 EN
E6 1 EN
E7 -9 EN, GER
E8 8 EN, GER

In df "dfln" there is a list of the languages.

Languages
EN
IT
ES
GER
CH
FR
NL
PT

I would like to look for the languages listed in "dfln" into "dfau" and add the count of these languages as a second column in "dfln" (and this is where I am stuck).

Afterwards, in "dfln", I would also like to add a column which would show the capacity per each language by summing the capacities left of each employee which can speak the language.

The employees which have a "Capacity Left" below 1 should not be counted nor summed.

This is how it should appear:

Languages Count Capacity
EN 6 85
IT 0 0
ES 3 15
GER 1 8
CH 1 6
FR 1 2
NL 1 45
PT 2 13

I tried to re-write the code in different ways but still this is where I got stuck (the result remains the basic dfln.

import pandas as pd
dfau = pd.read_excel (r'C:\Users\Projects\pywp\SampleAu.xlsx', sheet_name='Employees')
dfln = pd.read_excel (r'C:\Users\Projects\pywp\SampleAu.xlsx', sheet_name='Languages')
    
LCount=0
dfln.insert(1,"Count",LCount)
    
for language in dfln["Languages"]:
    if dfau.loc[dfau["Languages"].str.contains(language, case=False)] is True:
        LCount =1
    print(dfln)

What can I try next?

CodePudding user response:

I cannot make sense of the final output. However, you can achieve what you are asking by running the following code:

dfIn['Count'] = 0
dfIn['Capacity'] = 0

for idx,row in dfIn.iterrows():
    dfIn.iloc[idx,1] = dfau['Languages'][dfau['Capacity Left'] >= 1].str.contains(row.Languages).sum()
    dfIn.iloc[idx, 2] = dfau['Capacity Left'][dfau['Capacity Left'] >= 1][dfau['Languages'].str.contains(row.Languages)].sum()

Notice that the final output looks like this, and not what you posted above:

enter image description here

  • Related