Calculating total number of values based on same id in pandas dataframe-CodePudding

I have a dataframe that looks like this:

api_spec_id       commitdates       commits Year-Month API Age info_version  
84                  2014-12-15      110      2014-12     110      6.0.1  
84                  2014-11-06       33      2014-11      33      6.0.2
84                  2014-10-15      110      2014-10     110      6.0.3
84                  2014-12-02      110      2014-12     110      6.0.5
84                  2014-11-19       33      2014-11      33      7.0.2

api_spec_id is the id for every API in the dataframe, now the same API can have different versions within the same id, as it keeps changing for every commit date.

I want to count that for api_spec_id = 84, how many total versions are there, like here there are 5 in total.

My desired output is :

api_spec_id       commitdates       commits Year-Month API Age info_version  Total_versions
84                  2014-12-15      110      2014-12     110      6.0.1       5
84                  2014-11-06       33      2014-11      33      6.0.2       5
84                  2014-10-15      110      2014-10     110      6.0.3.      5
84                  2014-12-02      110      2014-12     110      6.0.5.      5
84                  2014-11-19       33      2014-11      33      7.0.2.      5

I tried using value_counts.(), sum() and few other solutions on similar questions found here on stack, however none of the solutions gave me the correct numbers which I want to achieve. What would be the best way to go about this? Any guidance will be really helpful.

CodePudding user response：

You can use pd.groupby and nunique for this:

df['Total_versions'] = df.groupby('api_spec_id').info_version.transform('nunique')

It counts the number of unique values in the column 'info_version' for each 'api_spec_id'.

Output:

api_spec_id commitdates commits Year-Month  API_Age info_version    Total_versions
0   84  2014-12-15  110 2014-12 110 6.0.1   5
1   84  2014-11-06  33  2014-11 33  6.0.2   5
2   84  2014-10-15  110 2014-10 110 6.0.3   5
3   84  2014-12-02  110 2014-12 110 6.0.5   5
4   84  2014-11-19  33  2014-11 33  7.0.2   5