I have a data frame that I need to count the unique items of a certain row. In the example below, I want to label the name for the below function as "NUM_CIK". What's the best way to assign a name to the groupby column?
Current code:
cik_groupby_cusip_occur = cik_groupby_cusip_occur.groupby(
['CUSIP'], sort=True)['CIK COMPANY'].size().sort_values(ascending=False)
Sample Output:
CUSIP
594918104 4560
037833100 4457
023135106 4053
02079K305 3545
478160104 3472
Wanted Output:
CUSIP NUM_CIK
594918104 4560
037833100 4457
023135106 4053
02079K305 3545
478160104 3472
CodePudding user response:
Use Series.reset_index
with name
parameter:
(cik_groupby_cusip_occur = cik_groupby_cusip_occur
.groupby('CUSIP')['CIK COMPANY']
.size()
.sort_values(ascending=False)
.reset_index(name='NUM_CIK'))
cik_groupby_cusip_occur = (cik_groupby_cusip_occur['CUSIP']
.value_counts()
.rename_axis('CUSIP')
.reset_index(name='NUM_CIK'))
CodePudding user response:
Either use reset_index(name='NUM_CIK')
Or:
cik_groupby_cusip_occur = (cik_groupby_cusip_occur
.groupby(['CUSIP'], sort=True)['CIK COMPANY']
.agg(NUM_CIK='size')
.sort_values(by='NUM_CIK', ascending=False)
)