How to create a new dataframe column using values and groupings from other rows and columns in panda-CodePudding

Consider the following dataframe...

    month   term    
0   Jun-22  one
1   Jun-22  one two
2   Jul-22  one
3   Jul-22  three
4   Jul-22  three four
5   Jul-22  three four five

I would like to add a column that is the count of the rows that contain the 'term' for the corresponding month excluding the row itself.

This is the desired output...

    month   term                term_count
0   Jun-22  one                 1
1   Jun-22  one two             0
2   Jul-22  one                 0
3   Jul-22  three               2
4   Jul-22  three four          1
5   Jul-22  three four five     0

So, for row 0, the term 'one' exists in one other row for Jun-22 so that term_count is 1.

For row 1, no other row in Jun-22 contains 'one two', so term_count is 0.

For row 2, no other row in Jul-22 contains 'one', so term_count is 0.

For row 3, two other rows in Jul-22 contain 'three', so term_count is 2.

etc

What is the best way to achieve this using pandas please?

Thank you.

CodePudding user response：

Use custom lambda function for test if exist values per groups in GroupBy.transform, for avoid count per row is subtract 1:

f = lambda x: [sum(z in w for w in x.tolist()[i:]) for i, z in enumerate(x)]
df['term_count'] = df.groupby('month')['term'].transform(f).sub(1)
print (df)
    month             term  term_count
0  Jun-22              one           1
1  Jun-22          one two           0
2  Jul-22              one           0
3  Jul-22            three           2
4  Jul-22       three four           1
5  Jul-22  three four five           0