Home > Software design >  Pandas value_counts with manual labels and sorting
Pandas value_counts with manual labels and sorting

Time:03-13

I have a field containing codes (DMDEDUC2 in this example). I'd like to calculate a frequency table (value_counts) on this field and display it with user-specified labels. The code below achieves exactly what I want... but it feels to me like I have surely missed a more standard way to achieve the intended result.

Logically, the value_counts and the replace lines cannot be simplified. But surely the rest could be more elegant.

Is there a simpler way to get to this result? A more panda-ish solution?

# Tiny dataset for clarity
import pandas as pd
df = pd.DataFrame({ 'DMDEDUC2': [5, 3, 3, 5, 4, 2, 4, 4] })
d = {
      1: "<9"
    , 2: "9-11"
    , 3: "HS/GED"
    , 4: "Some college/AA"
    , 5: "College"
    , 7: "Refused"
    , 9: "Don't know"
}

# First get value counts (vc) for DMDEDUC2
# This line gets all the data I need in the correct order... 
# but without the labels I need.
vc = df.DMDEDUC2.value_counts().sort_index()

# Convert the resulting Series to a DataFrame 
# to allow for clear labels in a logical order
vc = vc.to_frame()
vc['DMDEDUC2x'] = vc.index
vc.DMDEDUC2x = vc.DMDEDUC2x.replace(d)
vc = vc.set_index('DMDEDUC2x')
vc = vc.rename({'DMDEDUC2':'COUNTS'}, axis=1)
print(vc)

Desired output (it's ordered by the [non-displayed] code, not by value or label):

                 COUNTS
DMDEDUC2x              
<9                  655
9-11                643
HS/GED             1186
Some college/AA    1621
College            1366
Don't know            3

Desired output for the tiny sample dataset:

                 COUNTS
DMDEDUC2x              
9-11                  1
HS/GED                2
Some college/AA       3
College               2

CodePudding user response:

I think it can easily be condensed into two lines:

vc = df.DMDEDUC2.value_counts().sort_index().to_frame(name='COUNTS')
vc.index = vc.index.map(d).rename('DMDEDUC2')
  • Related