Consider the following dataframe...
month term
0 Jun-22 one
1 Jun-22 one two
2 Jul-22 one
3 Jul-22 three
4 Jul-22 three four
5 Jul-22 three four five
I would like to add a column that is the count of the rows that contain the 'term' for the corresponding month excluding the row itself.
This is the desired output...
month term term_count
0 Jun-22 one 1
1 Jun-22 one two 0
2 Jul-22 one 0
3 Jul-22 three 2
4 Jul-22 three four 1
5 Jul-22 three four five 0
So, for row 0, the term 'one' exists in one other row for Jun-22 so that term_count is 1.
For row 1, no other row in Jun-22 contains 'one two', so term_count is 0.
For row 2, no other row in Jul-22 contains 'one', so term_count is 0.
For row 3, two other rows in Jul-22 contain 'three', so term_count is 2.
etc
What is the best way to achieve this using pandas please?
Thank you.
CodePudding user response:
Use custom lambda function for test if exist values per groups in GroupBy.transform
, for avoid count per row is subtract 1
:
f = lambda x: [sum(z in w for w in x.tolist()[i:]) for i, z in enumerate(x)]
df['term_count'] = df.groupby('month')['term'].transform(f).sub(1)
print (df)
month term term_count
0 Jun-22 one 1
1 Jun-22 one two 0
2 Jul-22 one 0
3 Jul-22 three 2
4 Jul-22 three four 1
5 Jul-22 three four five 0