Home > Software design >  How to create a new dataframe column using values and groupings from other rows and columns in panda
How to create a new dataframe column using values and groupings from other rows and columns in panda

Time:07-18

Consider the following dataframe...

    month   term    
0   Jun-22  one
1   Jun-22  one two
2   Jul-22  one
3   Jul-22  three
4   Jul-22  three four
5   Jul-22  three four five

I would like to add a column that is the count of the rows that contain the 'term' for the corresponding month excluding the row itself.

This is the desired output...

    month   term                term_count
0   Jun-22  one                 1
1   Jun-22  one two             0
2   Jul-22  one                 0
3   Jul-22  three               2
4   Jul-22  three four          1
5   Jul-22  three four five     0

So, for row 0, the term 'one' exists in one other row for Jun-22 so that term_count is 1.

For row 1, no other row in Jun-22 contains 'one two', so term_count is 0.

For row 2, no other row in Jul-22 contains 'one', so term_count is 0.

For row 3, two other rows in Jul-22 contain 'three', so term_count is 2.

etc

What is the best way to achieve this using pandas please?

Thank you.

CodePudding user response:

Use custom lambda function for test if exist values per groups in GroupBy.transform, for avoid count per row is subtract 1:

f = lambda x: [sum(z in w for w in x.tolist()[i:]) for i, z in enumerate(x)]
df['term_count'] = df.groupby('month')['term'].transform(f).sub(1)
print (df)
    month             term  term_count
0  Jun-22              one           1
1  Jun-22          one two           0
2  Jul-22              one           0
3  Jul-22            three           2
4  Jul-22       three four           1
5  Jul-22  three four five           0
  • Related