my data column combines information on two axes: wealth and life stage. I need to break up the two-digit codes by their 'tens'-place and 'ones'-place digits into two new ordinal variables. How would I go about separating and re-encoding this data?
This is the data dictionary for the column:
- -1: unknown
- 11: Wealthy Households - Pre-Family Couples & Singles
- 12: Wealthy Households - Young Couples With Children
- 13: Wealthy Households - Families With School Age Children
- 14: Wealthy Households - Older Families & Mature Couples
- 15: Wealthy Households - Elders In Retirement
- 21: Prosperous Households - Pre-Family Couples & Singles
- 22: Prosperous Households - Young Couples With Children
- 23: Prosperous Households - Families With School Age Children
- 24: Prosperous Households - Older Families & Mature Couples
- 25: Prosperous Households - Elders In Retirement
- 31: Comfortable Households - Pre-Family Couples & Singles
- 32: Comfortable Households - Young Couples With Children
- 33: Comfortable Households - Families With School Age Children
- 34: Comfortable Households - Older Families & Mature Couples
- 35: Comfortable Households - Elders In Retirement
- 41: Less Affluent Households - Pre-Family Couples & Singles
- 42: Less Affluent Households - Young Couples With Children
- 43: Less Affluent Households - Families With School Age Children
- 44: Less Affluent Households - Older Families & Mature Couples
- 45: Less Affluent Households - Elders In Retirement
- 51: Poorer Households - Pre-Family Couples & Singles
- 52: Poorer Households - Young Couples With Children
- 53: Poorer Households - Families With School Age Children
- 54: Poorer Households - Older Families & Mature Couples
- 55: Poorer Households - Elders In Retirement
I have tried np.where() and created conditions:
I have used .map():
data2.map({11:'Wealthy', 12:'Wealthy', 13:'Wealthy', 14:'Wealthy', 15:'Wealthy',
# 21:'Prosperous', 22:'Prosperous', 23:'Prosperous', 24:'Prosperous', 25:'Prosperous',
# 31:'Comfortable', 32:'Comfortable', 33:'Comfortable', 34:'Comfortable', 35:'Comfortable',
# 41:'Less_Afflutent', 42:'Less_Afflutent', 43:'Less_Afflutent', 44:'Less_Afflutent', 45:'Less_Afflutent',
# 51:'Poorer', 52:'Poorer', 53:'Poorer', 54:'Poorer', 55:'Poorer'})
I've tried a couple other things as well that didn't seem worth even posting. I'm new and still learning but I've never come across a a problem like this before and I've done SO much googling
CodePudding user response:
If I understand the question correctly, you want to split your current status column into two new columns and then make mappings for each of those.
# Assuming your current column is called 'status'
df['tens_place'] = df['status']//10 # whole number of division, no remainder, gives -1 for -1
df['ones_place'] = df['status'] # just remainder, gives 9 for -1
tens_map = tens_map = {1:'Wealthy Households',2:'Prosperous households',
3:'Comfortable Households',4:'Less Affluent households' ,5:'Poorer households',
-1:'unknown'}
ones_map = ... # same sorta thing
df['wealth_status'] = df['tens_place'].map(tens_map)
df['family_status'] = df['ones_place'].map(ones_map)