Home > Net >  Create new variables that are equal but in a different type
Create new variables that are equal but in a different type

Time:12-03

Some of the variables in my df are in object dtype and I want to change them to int, and create a new variable to add to the df.

This is what I tried " df_l['price2'] = df_l['price'].astype('int')

df['host_acceptance_rate2'] = df_l['host_acceptance_rate'].astype('int')

df_l['host_is_superhost2'] = df_l['host_is_superhost'].astype('int') "

I was expecting them to change in format, but for all of them I received "invalid literal for int() with base 10 $18,000/79%/t"

CodePudding user response:

It looks like you're trying to convert columns that contain string values that cannot be directly converted to integers. In particular, the host_acceptance_rate column appears to contain values in the form of "$18,000/79%" and the host_is_superhost column appears to contain the string "t" or "f".

You can't convert these values to integers using the .astype() method because they don't represent valid integers. Instead, you'll need to clean the data and extract the relevant information from these strings before you can convert them to integers.

For example, you could use the str.split() method to split the host_acceptance_rate values on the / character, and then select the second element of the resulting list (which should be the percentage value without the % sign) and convert that to an integer. Similarly, you could use a conditional statement (like an if statement) to convert the host_is_superhost values to 1 if the value is "t" and 0 if the value is "f".

Here's an example of how you could accomplish this:

# Split host_acceptance_rate values on "/" character and select the second element (percentage without "%" sign)
df_l['host_acceptance_rate2'] = df_l['host_acceptance_rate'].str.split("/").str[1].astype('int')

# Convert host_is_superhost values to 1 if "t" and 0 if "f"
df_l['host_is_superhost2'] = df_l['host_is_superhost'].apply(lambda x: 1 if x == "t" else 0)

You may need to adjust this code depending on the exact format of the data in your DataFrame.

Note that this is just one way to approach this problem, and there may be other ways to clean and convert the data that work better for your specific use case.

  • Related