I have a date frame that I need to switch out values from it, like "BOX TRUCK" to just "TRUCK". I have tried to use the str.replace function but unfortunately it only changes the value of "TRUCK" in "BOX TRUCK" to "TRUCK" so I am left with the same issue. I need to be able to change multiple different values "BOX TRUck, FIRE TRUCK, Dumpster truck, etc." and so I have tried to look for "truck" and change them but it has not been working.
dft['VEHICLE_TYPE'] = dft['VEHICLE_TYPE'].str.replace(r'truck', 'TRUCK', regex=True, flags=re.IGNORECASE)
CodePudding user response:
Another method would be using a mask with str.contains
instead of str.replace
.
Also, rather than using an apply, use pd.Series.mask
mask = dft['VEHICLE_TYPE'].str.contains('truck', case=False)
dft['VEHICLE_TYPE'] = dft['VEHICLE_TYPE'].mask(mask, 'TRUCK')
CodePudding user response:
Considering that OP has a dataframe like the following
df = pd.DataFrame(np.random.randint(10,100,size=(10, 2)), columns=['Count', 'Vehicle_type'])
df['Vehicle_type'] = np.random.choice(['BOX TRUCK', 'FIRE TRUCK', 'Dumpster truck'], p=[0.5, 0.3, 0.2], size=10)
[Out]:
Count Vehicle_type
0 35 BOX TRUCK
1 30 Dumpster truck
2 34 FIRE TRUCK
3 69 BOX TRUCK
4 13 BOX TRUCK
5 70 BOX TRUCK
6 53 FIRE TRUCK
7 56 Dumpster truck
8 43 Dumpster truck
9 40 BOX TRUCK
Assuming that the goal is to change to TRUCK
anytime the value is BOX TRUCK
, FIRE TRUCK
, or, Dumpster truck
, then the following will do the work
df['Vehicle_type'] = df['Vehicle_type'].apply(lambda x: 'TRUCK' if x.lower() in ['box truck', 'fire truck', 'dumpster truck'] else x)
Count Vehicle_type
0 79 TRUCK
1 74 TRUCK
2 25 TRUCK
3 38 TRUCK
4 24 TRUCK
5 45 TRUCK
6 83 TRUCK
7 20 TRUCK
8 63 TRUCK
9 49 TRUCK
Notes:
One is using
x.lower()
to be able to ignore case.If the column has additional categories/vehicle types that one wants to consider for the change, one can just add them to the list in the lambda function (write them with lower case).
An alternative, that doesn't require one to know the types that exist, as longs as it contains the string truck
is using str.contains
, as follows
df['Vehicle_type'] = df['Vehicle_type'].apply(lambda x: 'TRUCK' if x.lower().str.contains('truck') else x)
Or with str.replace
, such as
df['Vehicle_type'] = df['Vehicle_type'].apply(lambda x: x.lower().str.replace('truck', 'TRUCK'))
If, on another hand, the goal is to replace, from the column Vehicle_type
, BOX TRUCK
to Regular Truck
(the same logic can be applied to other changes), the following will do the work
df['Vehicle_type'] = df['Vehicle_type'].replace('BOX TRUCK', 'Regular Truck')
[Out]:
Count Vehicle_type
0 35 Regular Truck
1 30 Dumpster truck
2 34 FIRE TRUCK
3 69 Regular Truck
4 13 Regular Truck
5 70 Regular Truck
6 53 FIRE TRUCK
7 56 Dumpster truck
8 43 Dumpster truck
9 40 Regular Truck