I have a data frame [df] that looks like this but much lagger:
title of the novel author publishing year mentionned cities
0 Beasts and creatures Bruno Ivory 2021 New York
0 Monsters Renata Mcniar 2023 New York
0 At risk Charles Dobi 2020 London
0 Manuela and Ricardo Lucas Zacci 2022 Rio de Janeiro
0 War against the machine Angelina Trotter 1999 Rio de Janeiro
I would like to add another column with the objective of counting all the occurences of the cities. The problem is that I want to maintain the year of that occurrence, as I work with history. In other words, it is important for me to be able to know when the city was mentionned.
The expected outcome would look like this:
title of the novel author publishing year mentionned cities Counter
0 Beasts and creatures Bruno Ivory 2021 New York 1
0 Monsters Renata Mcniar 2022 New York 2
0 At risk Charles Dobi 2020 London 1
0 Manuela and Ricardo Lucas Zacci 2022 Rio de Janeiro 1
0 War against the machine Angelina Trotter 1999 Rio de Janeiro 2
So far, I have just managed to count all the occurrences, but I could not relate it to the publishing years. The code I am using is:
df ['New York'] = df.eq('New York').sum().to_frame().T
Can someone help me, please?
edit:
I tried joining two dataframes and I got something interesting but not what I really wanted. The problem is that it does not keep the Publishing year on track.
d[f'counter'] = d.fgroupby('mentionned cities')['mentionned cities'].transform('counter')
result = pd.concat([df['New York'], df], axis=1, join='inner')
display(result)
Output:
title of the novel author publishing year mentionned cities Counter
0 Beasts and creatures Bruno Ivory 2021 New York 2
0 Monsters Renata Mcniar 2023 New York 2
0 At risk Charles Dobi 2020 London 1
0 Manuela and Ricardo Lucas Zacci 2022 Rio de Janeiro 1
0 War against the machine Angelina Trotter 1999 Rio de Janeiro 2
The problem still lingers on
CodePudding user response:
df['Counter'] = df.groupby('mentionned cities').cumcount() 1
CodePudding user response:
If I understand it right, you can just concatenate both columns and do the same thing as you did. You need first to convert year into string, which can be done with str function, then it is easy to join both strings with .
CodePudding user response:
Perhaps you could use a for loop to iterate through the 'mentioned cities' column, and use a dict to count the occurrences of cities:
city_count = {}
count_column = []
for city in df['mentionned cities']:
city_count[city] = city_count.get(city, 0) 1
count_column.append(city_count[city])
df['Counter'] = count_column