Home > Net >  Why pandas does not want to subset given columns in a list
Why pandas does not want to subset given columns in a list

Time:10-28

I'm trying to remove certain values with that code, however pandas does not give me to, instead outputs

ValueError: Unable to coerce to Series, length must be 10: given 2

Here is my code:

import pandas as pd
df = pd.read_csv("/Volumes/SSD/IT/DataSets/Automobile_data.csv")
print(df.shape)
columns_df  = ['index', 'company', 'body-style', 'wheel-base', 'length', 'engine-type',
       'num-of-cylinders', 'horsepower', 'average-mileage', 'price']
prohibited_symbols = ['?','Nan''n.a']
df = df[df[columns_df] != prohibited_symbols]
print(df)

CodePudding user response:

Try:

df = df[~df[columns_df].str.contains('|'.join(prohibited_symbols))]

The regex operator '|' helps remove records that contain any of your prohibited symbols.

CodePudding user response:

Because what you are trying is not doing what you imagine it should.
df = df[df[columns_df] != prohibited_symbols]
Above line will always return False values for everything. You can't iterate over a list of prohibited symbols like that. != will do only a simple inequality check and none of your cells will be equal to the list of prohibited symbols probably. Also using that syntax will not delete those values from your cells.

You'll have to use a for loop and clean every column like this.

for column in columns_df: 
    df[column] = df[column].str.replace('|'.join(prohibited_symbols), '', regex=True)
  • Related