i have this list for example dates = ["2020-2-1", "2020-2-3", "2020-5-8"]
now i want to make a dataframe which contains only the month and year then the count of how many times it appeared, the output should be like
Date | Count |
---|---|
2020-2 | 2 |
2020-5 | 1 |
CodePudding user response:
import pandas as pd
dates = ["2020-2-1", "2020-2-3", "2020-5-8"]
df = pd.DataFrame({'Date':dates})
df['Date'] = df['Date'].str.slice(0,6)
df['Count'] = 1
df = df.groupby('Date').sum().reset_index()
Note: you might want to use the format "2020-02-01" with padded zeros so that the first 7 characters are always the year and month
CodePudding user response:
This will give you a "month" and "year" column with the count of the year/month
If you want you could just combine the month/year columns together, but this will give you the results you expect if not exactly cleaned up.
df = pd.DataFrame({'Column1' : ["2020-2-1", "2020-2-3", "2020-5-8"]})
df['Month'] = pd.to_datetime(df['Column1']).dt.month
df['Year'] = pd.to_datetime(df['Column1']).dt.year
df.groupby(['Month', 'Year']).agg('count').reset_index()