I have constructed a pretty basic dataframe where the column names are years:
import pandas as pd
column_names = [x for x in range(2000,2005)]
df = pd.DataFrame(columns=column_names)
Which of course gives me a dataframe where the column names are years and currently has no entries.
2000 2001 2002 2003 2004
I also have a different dataframe where one column has specific dates and the second column has the corresponding year. I call this dataframe set0 because I will continously add more sets which are then numbered (set1,set2,etc.).
data = {'Date': ['2001-06-08', '2002-05-23', '2002-05-24', '2003-06-23'],
'Year': [2001, 2002, 2002, 2003]}
df2 = pd.DataFrame(data)
Date Year
0 2001-06-08 2001
1 2002-05-23 2002
2 2002-05-24 2002
3 2003-06-23 2003
Now what I want to do is to create something like this: It takes the first dataframe, adds a first column which has a name of a certain dataset, in this case set0. I will then group this dataset by years and if I have an entry for a year I can create this:
set_name 2000 2001 2002 2003 2004
set0 0 1 2 1 0
I have found nothing similar on the web. I have done the grouping but then wasn't able to add the entries in the corresponding columns. Any help or hint is much appreciated!
CodePudding user response:
Does this answer your question ?
import pandas as pd
column_names = [x for x in range(2000, 2005)]
df = pd.DataFrame(index=column_names)
data = {
'Date': ['2001-06-08', '2002-05-23', '2002-05-24', '2003-06-23'],
'Year': [2001, 2002, 2002, 2003]
}
df2 = pd.DataFrame(data)
df2_grouped = df2.groupby('Year').count()['Date']
df['set0'] = df2_grouped
# Pandas version 1.5
df = df.fillna(0).reset_index(names='set_name').pivot_table(columns='set_name')
# All versions :
df = df.fillna(0).reset_index().rename(columns={
'index': 'set_name'
}).pivot_table(columns='set_name')
print(df)
Result :
set_name 2000 2001 2002 2003 2004
set0 0.0 1.0 2.0 1.0 0.0