Home > database >  Pandas groupby using only year and month
Pandas groupby using only year and month

Time:03-30

I have a Python program using Pandas, which reads two dataframes, obtained in the following links:

Casos-positivos-diarios-en-San-Nicolas-de-los-Garza-Promedio-movil-de-7-dias: https://datamexico.org/es/profile/geo/san-nicolas-de-los-garza#covid19-evolucion

Denuncias-segun-bien-afectado-en-San-Nicolas-de-los-GarzaClic-en-el-grafico-para-seleccionar: https://datamexico.org/es/profile/geo/san-nicolas-de-los-garza#seguridad-publica-denuncias

What I currently want to do is a groupby in the "covid" dataframe with the same dates, having a sum of these. Regardless, no method has worked out, which regularly prints an error indicating that I should be using a syntaxis for "PeriodIndex". Does anyone have a suggestion or solution? Thanks in advance.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib notebook

#csv for the covid cases
covid = pd.read_csv('Casos-positivos-diarios-en-San-Nicolas-de-los-Garza-Promedio-movil-de-7-dias.csv')

#csv for complaints
comp = pd.read_csv('Denuncias-segun-bien-afectado-en-San-Nicolas-de-los-GarzaClic-en-el-grafico-para-seleccionar.csv')

#cleaning data in both dataframes

#keeping only the relevant columns
covid = covid[['Month','Daily Cases']]
comp = comp[['Month','Affected Legal Good', 'Value']]

#changing the labels from spanish to english
comp['Affected Legal Good'].replace({'Patrimonio': 'Heritage', 'Familia':'Family', 'Libertad y Seguridad Sexual':'Sexual Freedom and Safety', 'Sociedad':'Society', 'Vida e Integridad Corporal':'Life and Bodily Integrity', 'Libertad Personal':'Personal Freedom', 'Otros Bienes Jurídicos Afectados (Del Fuero Común)':'Other Affected Legal Assets (Common Jurisdiction)'}, inplace=True, regex=True)
#changing the month types to dates
covid['Month'] = pd.to_datetime(covid['Month'])
covid['Month'] = covid['Month'].dt.to_period('M')

covid

CodePudding user response:

You can simply usen group by statement.Timegrouper by default converts it to datetime

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib notebook

#csv for the covid cases
covid = pd.read_csv('Casos-positivos-diarios-en-San-Nicolas-de-los-Garza-Promedio-movil-de-7-dias.csv')


covid = covid.groupby(['Month'])['Daily Cases'].sum()
covid = covid.reset_index()
# #changing the month types to dates
covid['Month'] = pd.to_datetime(covid['Month'])
covid['Month'] = covid['Month'].dt.to_period('M')

covid
  • Related