How to find a row content from a Pandas df into another df and then add the count of the findings in-CodePudding

I just started using Python and I am trying to improve myself by engaging in different projects.

At the moment I am trying to create a work allocation tool. I loaded 2 xlsx files as df.

In df "dfau" there is a list of the employees with their capacity left and the languages that they speak.

Employees	Capacity Left	Languages
E1	0	EN, GER
E2	-11	EN, IT, ES
E3	6	EN, CH
E4	2	EN, ES, FR
E5	-1	EN
E6	1	EN
E7	-9	EN, GER
E8	8	EN, GER

In df "dfln" there is a list of the languages.

Languages
EN
IT
ES
GER
CH
FR
NL
PT

I would like to look for the languages listed in "dfln" into "dfau" and add the count of these languages as a second column in "dfln" (and this is where I am stuck).

Afterwards, in "dfln", I would also like to add a column which would show the capacity per each language by summing the capacities left of each employee which can speak the language.

The employees which have a "Capacity Left" below 1 should not be counted nor summed.

This is how it should appear:

Languages	Count	Capacity
EN	6	85
IT	0	0
ES	3	15
GER	1	8
CH	1	6
FR	1	2
NL	1	45
PT	2	13

I tried to re-write the code in different ways but still this is where I got stuck (the result remains the basic dfln.

import pandas as pd
dfau = pd.read_excel (r'C:\Users\Projects\pywp\SampleAu.xlsx', sheet_name='Employees')
dfln = pd.read_excel (r'C:\Users\Projects\pywp\SampleAu.xlsx', sheet_name='Languages')
    
LCount=0
dfln.insert(1,"Count",LCount)
    
for language in dfln["Languages"]:
    if dfau.loc[dfau["Languages"].str.contains(language, case=False)] is True:
        LCount =1
    print(dfln)

What can I try next?

CodePudding user response：

I cannot make sense of the final output. However, you can achieve what you are asking by running the following code:

dfIn['Count'] = 0
dfIn['Capacity'] = 0

for idx,row in dfIn.iterrows():
    dfIn.iloc[idx,1] = dfau['Languages'][dfau['Capacity Left'] >= 1].str.contains(row.Languages).sum()
    dfIn.iloc[idx, 2] = dfau['Capacity Left'][dfau['Capacity Left'] >= 1][dfau['Languages'].str.contains(row.Languages)].sum()

Notice that the final output looks like this, and not what you posted above: