Home > Enterprise >  How to name the column when using value_count function in pandas?
How to name the column when using value_count function in pandas?

Time:04-11

I was counting the no of occurrence of angle and dist by the code below:

g = new_df.value_counts(subset=['Current_Angle','Current_dist'] ,sort = False)

the output:

current_angle    current_dist    0
 -50                30           1
 -50                40           2
 -50                41           6
 -50                45           4

try1:
g.columns = ['angle','Distance','count','Percentage Missed'] - result was no change in the name of column

try2:
When I print the columns using print(g.columns) ended with error AttributeError: 'Series' object has no attribute 'columns'

I want to rename the column 0 as count and add a new column to the dataframe g as percent missed which is calculated by 100 - value in column 0

Expected output

current_angle    current_dist    count  percent missed
 -50                30           1          99
 -50                40           2          98
 -50                41           6          94
 -50                45           4          96

1:How to modify the code? I mean instead of value_counts, is there any other function that can give the expected output? 2. How to get the expected output with the current method?

CodePudding user response:

First add Series.reset_index, because DataFrame.value_counts return Series, so possible use parameter name for change column 0 to count column and then subtract 100 to new column by Series.rsub for subtract from right side like 100 - df['count']:

df = (new_df.value_counts(subset=['Current_Angle','Current_dist'] ,sort = False)
            .reset_index(name='count')
            .assign(**{'percent missed': lambda x: x['count'].rsub(100)}))

Or if need also set new columns names use DataFrame.set_axis:

df = (new_df.value_counts(subset=['Current_Angle','Current_dist'] ,sort = False)
            .reset_index(name='count')
            .set_axis(['angle','Distance','count'], axis=1)
            .assign(**{'percent missed': lambda x: x['count'].rsub(100)}))

If need assign new columns names here is alternative solution:

df = (new_df.value_counts(subset=['Current_Angle','Current_dist'] ,sort = False)
            .reset_index())
df.columns = ['angle','Distance','count']
df['percent missed'] = df['count'].rsub(100)

CodePudding user response:

Assuming a DataFrame as input (if not reset_index first), simply use rename and a subtraction:

df = df.rename(columns={'0': 'count'})   # assuming string '0' here, else use 0
df['percent missed'] = 100 - df['count']

output:

   current_angle  current_dist  count  percent missed
0            -50            30      1              99
1            -50            40      2              98
2            -50            41      6              94
3            -50            45      4              96
alternative: using groupby.size:
(new_df
 .groupby(['current_angle','current_dist']).size()
 .reset_index(name='count')
 .assign(**{'percent missed': lambda d: 100-d['count']})
)

output:

   current_angle  current_dist  count  percent missed
0            -50            30      1              99
1            -50            40      2              98
2            -50            41      6              94
3            -50            45      4              96
  • Related