I have a field containing codes (DMDEDUC2 in this example). I'd like to calculate a frequency table (value_counts) on this field and display it with user-specified labels. The code below achieves exactly what I want... but it feels to me like I have surely missed a more standard way to achieve the intended result.
Logically, the value_counts
and the replace
lines cannot be simplified. But surely the rest could be more elegant.
Is there a simpler way to get to this result? A more panda-ish solution?
# Tiny dataset for clarity
import pandas as pd
df = pd.DataFrame({ 'DMDEDUC2': [5, 3, 3, 5, 4, 2, 4, 4] })
d = {
1: "<9"
, 2: "9-11"
, 3: "HS/GED"
, 4: "Some college/AA"
, 5: "College"
, 7: "Refused"
, 9: "Don't know"
}
# First get value counts (vc) for DMDEDUC2
# This line gets all the data I need in the correct order...
# but without the labels I need.
vc = df.DMDEDUC2.value_counts().sort_index()
# Convert the resulting Series to a DataFrame
# to allow for clear labels in a logical order
vc = vc.to_frame()
vc['DMDEDUC2x'] = vc.index
vc.DMDEDUC2x = vc.DMDEDUC2x.replace(d)
vc = vc.set_index('DMDEDUC2x')
vc = vc.rename({'DMDEDUC2':'COUNTS'}, axis=1)
print(vc)
Desired output (it's ordered by the [non-displayed] code, not by value or label):
COUNTS
DMDEDUC2x
<9 655
9-11 643
HS/GED 1186
Some college/AA 1621
College 1366
Don't know 3
Desired output for the tiny sample dataset:
COUNTS
DMDEDUC2x
9-11 1
HS/GED 2
Some college/AA 3
College 2
CodePudding user response:
I think it can easily be condensed into two lines:
vc = df.DMDEDUC2.value_counts().sort_index().to_frame(name='COUNTS')
vc.index = vc.index.map(d).rename('DMDEDUC2')