Home > other >  How to count the occurrences of a value in a data frame?
How to count the occurrences of a value in a data frame?

Time:07-11

I have a data frame [df] that looks like this but much lagger:

title of the novel                author          publishing year   mentionned cities   
0   Beasts and creatures        Bruno Ivory             2021           New York 
0   Monsters                    Renata Mcniar           2023           New York 
0   At risk                     Charles Dobi            2020           London   
0   Manuela and Ricardo         Lucas Zacci             2022           Rio de Janeiro
0   War against the machine     Angelina Trotter        1999           Rio de Janeiro

I would like to add another column with the objective of counting all the occurences of the cities. The problem is that I want to maintain the year of that occurrence, as I work with history. In other words, it is important for me to be able to know when the city was mentionned.

The expected outcome would look like this:

title of the novel     author    publishing year    mentionned cities       Counter
0   Beasts and creatures        Bruno Ivory             2021           New York   1 
0   Monsters                    Renata Mcniar           2022           New York   2
0   At risk                     Charles Dobi            2020           London     1
0   Manuela and Ricardo         Lucas Zacci             2022           Rio de Janeiro 1
0   War against the machine     Angelina Trotter        1999           Rio de Janeiro 2

So far, I have just managed to count all the occurrences, but I could not relate it to the publishing years. The code I am using is:

df ['New York'] = df.eq('New York').sum().to_frame().T

Can someone help me, please?

edit:

I tried joining two dataframes and I got something interesting but not what I really wanted. The problem is that it does not keep the Publishing year on track.

    d[f'counter'] = d.fgroupby('mentionned cities')['mentionned cities'].transform('counter')

result = pd.concat([df['New York'], df], axis=1, join='inner')
    display(result)

Output:

title of the novel     author    publishing year    mentionned cities       Counter
    0   Beasts and creatures        Bruno Ivory             2021           New York   2 
    0   Monsters                    Renata Mcniar           2023           New York   2
    0   At risk                     Charles Dobi            2020           London     1
    0   Manuela and Ricardo         Lucas Zacci             2022           Rio de Janeiro 1
    0   War against the machine     Angelina Trotter        1999           Rio de Janeiro 2

The problem still lingers on

CodePudding user response:

df['Counter'] = df.groupby('mentionned cities').cumcount()   1

CodePudding user response:

If I understand it right, you can just concatenate both columns and do the same thing as you did. You need first to convert year into string, which can be done with str function, then it is easy to join both strings with .

CodePudding user response:

Perhaps you could use a for loop to iterate through the 'mentioned cities' column, and use a dict to count the occurrences of cities:

city_count = {}
count_column = []
for city in df['mentionned cities']:
    city_count[city] = city_count.get(city, 0)   1
    count_column.append(city_count[city])

df['Counter'] = count_column
  • Related