So the basic premise is that I want to combine 6 columns that represent a person's "concern" over a virus. Each of the columns is either a 1 for yes or 0 for no. So they would have only one 1 per set of columns.
- Columns 1, 2 and 3 should be 0 for concern
- Column 4 should be 1
- Column 5 should be 2
- Column 6 should be 3
Example:
Column1 | Column2 | Column3 | Column4 | Column5 | Column6 |
---|---|---|---|---|---|
1 | 0 | 0 | 0 | 0 | 0 |
0 | 1 | 0 | 0 | 0 | 0 |
0 | 0 | 1 | 0 | 0 | 0 |
0 | 0 | 0 | 1 | 0 | 0 |
0 | 0 | 0 | 0 | 1 | 0 |
0 | 0 | 0 | 0 | 0 | 1 |
Result:
Column7 |
---|
0 |
0 |
0 |
1 |
2 |
3 |
I have tried the code below but it returns a column of 0.
#setup the logic for getting the responses of 1 in each of the columns
conditions = [(survey['Column1'] == '1') | (survey['Column2'] == '1') | (survey['Column3'] == '1'),
(survey['Column4'] == '1'), (survey['Column5']) == '1', (survey['Column6']) == '1']
#setup the values that are going to be placed into the column for the conditions
values = [0, 1, 2, 3]
#creating the column
df['Column7'] = np.select(conditions, values, default = 0)
This dataset is going to be used to make some predictive models, but I'm also wondering if I'm making this too hard and should just leave the columns as 0/1 instead of assigning these values to the columns results.
CodePudding user response:
You can define a weight value for each column and then sum up the weight per row:
weight = [0, 0, 0, 1, 2, 3]
df["Column7"] = df.loc[:, "Column1":"Column6"].mul(weight).sum(axis=1)