Home > Enterprise >  How do I successfully create ordered categorical data from an existing column?
How do I successfully create ordered categorical data from an existing column?

Time:10-29

Here's a dataframe - 'dist_copy' - with values I would like to change to categorical. I've only included one column, but there are additional columns I want to convert as well.

state dist_id pct_free_reduced_lunch
Illinois 1111 80% - 100%
Illinois 1112 0 - 20%
Illinois 2365 40% - 60%

dist_copy.pct_free_reduced_lunch.unique()

returns

array(['80% - 100%', '60% - 80%', '0 - 20%', '20% - 40%', '40% - 60%'], dtype=object)

Previously, I used pd.Categorical to change all the values in the 'pct_free_reduced_lunch' column to 'categorical', and established the order, with this code:

dist_copy['pct_free_reduced_lunch'] = pd.Categorical(dist_copy['pct_free_reduced_lunch'], 
     categories=['0 — 20%','20% — 40%', '40% — 60%', '60% — 80%',  '80% - 100%'], ordered=True)

Today, this code isn't working, and only retains the first value, changing all other values to NaN.

state dist_id pct_free_reduced_lunch
Illinois 1111 80% - 100%
Illinois 1112 NaN
Illinois 2365 NaN

What am I doing wrong, or misunderstanding?

UPDATE: The above code began to work AFTER I copy-pasted each categorical value from the array returned by unique() into the categories array inside the pd.Categorical function, in the desired order.

When I merely entered them from scratch, NaNs were created.

WHY? I would really love to know!

CodePudding user response:

You are using dashes instead of hyphens for you categories argument, all but for '80% - 100%'. But in the data there are only hyphens and therefore all but '80% - 100%' turn into NaN.

Try using categories=sorted(dist_copy.pct_free_reduced_lunch.unique()) to avoid this kind of typo.

  • Related