Home > Software design >  Python, pandas: Removing values with a small length in dataframe
Python, pandas: Removing values with a small length in dataframe

Time:12-10

I am trying to remove all values in this pandas dataframe that have that have less than length 3, but not to all columns

import pandas 

df = pd.DataFrame({'id': [1, 2, 3],'player': ['w', 'George', 'Roland'], 'hometown': ['Miami', 'Caracas', 'Mexico City'], 'current_city': ['New York', '-', 'New York']})

columns_to_add = ['player', 'hometown', 'current_city']

for column_name in columns_to_add:
    df.loc[(len(df[column_name]) < 3), column_name] = None

I am trying the following code but I get the following error:

KeyError("cannot use a single bool to index into setitem")

Note:

CodePudding user response:

You can use applymap to calculate the length, then np.where to update:

df[columns_to_add] = np.where(df[columns_to_add].applymap(len) >=3, 
                              df[columns_to_add], None)

Output:

   id  player     hometown current_city
0   1    None        Miami     New York
1   2  George      Caracas         None
2   3  Roland  Mexico City     New York

CodePudding user response:

Try this:

df[df[columns_to_add].apply(lambda col: col.str.len() < 3)] = np.nan

Output:

>>> df
   id  player     hometown current_city
0   1     NaN        Miami     New York
1   2  George      Caracas          NaN
2   3  Roland  Mexico City     New York

CodePudding user response:

you can use the 'replace' function in DataFrame :

def find_string_less_lenth(list_of_values):
    return [i for i in list_of_values if len(i)<3]
for column_name in columns_to_add:
    df[column_name] = \
df[column_name].replace(find_string_less_lenth(df[column_name].values), 'none')

CodePudding user response:

I think the simplest solution might be

new_df = df[columns_to_add]
new_df[new_df.applymap(len) > 3]
  • Related