Home > Mobile >  Is is possible to one-hot based on a list of values being an element in a column?
Is is possible to one-hot based on a list of values being an element in a column?

Time:04-14

Here's the dataframe that I'm working with:

| Name     | Vehicle Types Owned  |
| -------- | --------------       |
| Bob      | [Car, Bike]          |
| Billy    | [Car, Truck, Train]  |
| Rob      | [Plane, Train, Boat] |
| Sally    | [Bike, Boat]         |

I am looking for something like this

| Name | Car | Bike | Truck | Train | Plane | Boat |
| ---- | ----| ---  | ---   | ---   | ---   | ---  |
| Bob  | 1   | 1    | 0     | 0     | 0     | 0    |                 
| Billy| 1   | 0    | 1     | 1     | 0     | 0    |
| Rob  | 0   | 0    | 0     | 1     | 1     | 1    |
| Sally| 0   | 1    | 0     | 0     | 0     | 1    |

The original dataframe looked like this, in case that'd be more useful to work with:

| Name     | Vehicle Type Owned | Num Wheels | Top Speed | 
| -------- | --------------     | ---------  | --------- |
| Bob      | Car                | 4          | 200 mph   |
| Bob      | Bike               | 2          | 20 mph    |
| Billy    | Car                | 4          | 220 mph   |
| Billy    | Truck              | 8          | 100 mph   |
| Billy    | Train              | 80         | 86  mph   |
| Rob      | Plane              | 3          | 600 mph   |
| Rob      | Train              | 80         | 98  mph   |
| Rob      | Boat               | 3          | 128 mph   |
| Sally    | Bike               | 2          | 34  mph   |
| Sally    | Boat               | 3          | 78 mph    |

I'm using pandas.

CodePudding user response:

Try explode then crosstab

df = df.explode('Vehicle Types Owned')
df_ = pd.crosstab([df['Name']], df['Vehicle Types Owned']).reset_index().rename_axis(None, axis=1)
print(df_)

    Name  Bike  Boat  Car  Plane  Train  Truck
0  Billy     0     0    1      0      1      1
1    Bob     1     0    1      0      0      0
2    Rob     0     1    0      1      1      0
3  Sally     1     1    0      0      0      0

CodePudding user response:

From the original dataframe, you can use pivot_table:

df.assign(count=1).pivot_table(index='Name', columns='Vehicle Type Owned', values='count', fill_value=0)

Result:

Vehicle Type Owned  Bike  Boat  Car  Plane  Train  Truck
Name                                                    
Billy                  0     0    1      0      1      1
Bob                    1     0    1      0      0      0
Rob                    0     1    0      1      1      0
Sally                  1     1    0      0      0      0
  • Related