Home > front end >  Populating a subset of rows in a dataframe with values from another column / Collapsing several colu
Populating a subset of rows in a dataframe with values from another column / Collapsing several colu

Time:06-12

First time posting here. I expect there's a better way of doing what I'm trying to do. I've been going round in circles for days and would really appreciate some help.

I am working with survey data about prisoners and their sentences. Each prisoner has a type for the purpose of the survey, and this is stored in the column 'prisoner_type'. For each prisoner type, there is a group of 5 columns where their offenses can be recorded (not all columns are necessarilly used). I'd like to collapse these groups of columns into one set of 5 columns and add these to the dataset so that, on each row, there is one set of 5 columns where I can find the offenses.

I have created a dictionary to look up the column names that the offence codes and offence types are stored in for each prisoner type. The key in the outer dictionary is the prisoner type. Here is an abridged version:

offense_variables= 
{  3={'codes':{1:'V0114',2:'V0115',3:'V0116',4:'V0117',5:'V0118'},
      'off_types':{1:'V0124',2:'V0125',3:'V0126',4:'V0127',5:'V0128'}}

8={'codes':{1:'V0270',2:'V0271',3:'V0272',4:'V0273',5:'V0274'},
       'off_types': {1:'V0280',2:'V0281',3:'V0282',4:'V0283',5:'V0285'}}  }

I am first creating 10 new columns: offense_1...offense_5 and type_1...type_5.

I am then trying to:

  1. Use pandas iloc to locate the all the rows for a given prisoner type
  2. Set the values for the new columns by looking up the variable for each offense number under that prisoner type in the dictionary, and assign that column as the new values.

Problems:

  1. The code doesn't terminate. I'm not sure why it's running on and on.
  2. I recieve the error message "A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead"
pris_types=[3,8]

for pt in pris_types:

  #five offenses are listed in the survey, so we need five columns to hold offence codes
  #and five to hold offence types

  #1 and 2 are just placeholder values    
  for item in [i 1 for i in range(5)]:
    dataset[f'off_{item}_code']='1'
    dataset[f'off_{item}_type']='2'
  
  #then use .loc to get indexes for this prisoner type
  #look up the variable of the column that we need to take the values from 
  #using the dictionary shown above 

  for item in [i 1 for i in range(5)]:                
    dataset.loc[dataset['prisoner_type'] == pt, \
    dataset[f'off_{item}_code']] = \
    dataset[offense_variables[pt]['codes'][item]]
    
    dataset.loc[dataset[prisoner_type] == pt, \
    dataset[f'off_{item}_type']] = \
    dataset[offense_variables[pt]['types'][item]]

CodePudding user response:

The problem is that in your .loc[] sections, you just need to use the column label (string object) to identify the column where values are to be set, not the entire series/column object, as you are currently doing. With your current code, you are creating new columns named with values stored in the dataset[f'off_{item}_type'] columns. So, instead of:

for item in [i 1 for i in range(5)]:                
    dataset.loc[dataset['prisoner_type'] == pt, \
    dataset[f'off_{item}_code']] = \
    dataset[offense_variables[pt]['codes'][item]]
    
    dataset.loc[dataset[prisoner_type] == pt, \
    dataset[f'off_{item}_type']] = \
    dataset[offense_variables[pt]['types'][item]]

use:

for item in range(1,6):                
    (dataset.loc[dataset['prisoner_type'] == pt, \
    f'off_{item}_code'] = \
    dataset[offense_variables[pt]['codes'][item]]
    
    dataset.loc[dataset[prisoner_type] == pt, \
    f'off_{item}_type'] = \
    dataset[offense_variables[pt]['types'][item]]

(I simplified your range loop line too.)

Also, you don't need to have the statements creating the 10 new columns inside the loop over prisoner types, you can move them outside of that loop. You actually don't need to create them manually like that. The .loc[] code would create them for you.

  • Related