Here's a dataframe - 'dist_copy' - with values I would like to change to categorical. I've only included one column, but there are additional columns I want to convert as well.
state | dist_id | pct_free_reduced_lunch |
---|---|---|
Illinois | 1111 | 80% - 100% |
Illinois | 1112 | 0 - 20% |
Illinois | 2365 | 40% - 60% |
dist_copy.pct_free_reduced_lunch.unique()
returns
array(['80% - 100%', '60% - 80%', '0 - 20%', '20% - 40%', '40% - 60%'], dtype=object)
Previously, I used pd.Categorical
to change all the values in the 'pct_free_reduced_lunch' column to 'categorical', and established the order, with this code:
dist_copy['pct_free_reduced_lunch'] = pd.Categorical(dist_copy['pct_free_reduced_lunch'],
categories=['0 — 20%','20% — 40%', '40% — 60%', '60% — 80%', '80% - 100%'], ordered=True)
Today, this code isn't working, and only retains the first value, changing all other values to NaN.
state | dist_id | pct_free_reduced_lunch |
---|---|---|
Illinois | 1111 | 80% - 100% |
Illinois | 1112 | NaN |
Illinois | 2365 | NaN |
What am I doing wrong, or misunderstanding?
UPDATE: The above code began to work AFTER I copy-pasted each categorical value from the array returned by unique()
into the categories array inside the pd.Categorical
function, in the desired order.
When I merely entered them from scratch, NaNs were created.
WHY? I would really love to know!
CodePudding user response:
You are using dashes instead of hyphens for you categories
argument, all but for '80% - 100%'
. But in the data there are only hyphens and therefore all but '80% - 100%'
turn into NaN
.
Try using categories=sorted(dist_copy.pct_free_reduced_lunch.unique())
to avoid this kind of typo.