Home > Net >  use custom column list in pandas crosstab
use custom column list in pandas crosstab

Time:06-17

I have a dataframe like as below

ID,Region,year,output
1,ANZ,1978,1
1,ANZ,2019,1
1,ANZ,2021,1
1,ASEAN,2021,1
1,ASEAN,2021,2
1,ASEAN,2020,3
2,UK,2021,8
2,UK,2021,1
2,UK,2021,0

I would like to do the below

a) create 4 year columns year_2019,year_2020,year_2021 and year_2022.

b) put the count of records for each ID under the respective year columns

I tried the below

pd.crosstab(
      index=tf['ID'], columns=tf['year'],
      values=tf['year'], aggfunc='count').fillna(0)

But the problem is it convert all year values as columns in output dataframe.

My real dataframe has 4 million rows and has 50 unique year values.(from 1970 to 2022)

But I want the count of year values only for 2019,2020,2021 and 2022.

I expect my output to be like as below

ID,year_2019,year_2020,year_2021,year_2022
1,1,1,3,0
2,0,0,3,0

CodePudding user response:

You can use:

out = (pd
 .crosstab(df['ID'], df['year'])
 .reindex(range(2019, 2022 1), axis=1, fill_value=0)
 .add_prefix('year_')
 .reset_index()
 .rename_axis(columns=None)
)

output:

   ID  year_2019  year_2020  year_2021  year_2022
0   1          1          1          3          0
1   2          0          0          3          0
  • Related