Home > Blockchain >  Remove values from columns that meet simultaneous DataFrame pandas condition
Remove values from columns that meet simultaneous DataFrame pandas condition

Time:11-04

Be the following DataFrame in pandas.

country ctry city cty other important other_important other_1
France France París París blue 019210 0011119 red
Spain Spain Madrid Barcelona blue 1211 0019210 blue
Germany Spain Barcelona Barcelona white 019210 1212 red
France UK Bourdeux London blue 019210 91021 red

I have to fill with NaN the information of the unimportant columns (other) in case country != ctry || city != cty. Dataframe result:

country ctry city cty other important other_important other_1
France France París París blue 019210 0011119 red
Spain Spain Madrid Barcelona NaN 1211 0019210 NaN
Germany Spain Barcelona Barcelona NaN 019210 1212 NaN
France UK Bourdeux London NaN 019210 91021 NaN

Finally I delete the country and city columns.

    df = df.drop(['country', 'city'], axis=1)
ctry cty other important other_important other_1
France París blue 019210 0011119 red
Spain Barcelona NaN 1211 0019210 NaN
Spain Barcelona NaN 019210 1212 NaN
UK London NaN 019210 91021 NaN

I would be grateful if the columns that I want to leave as NaN, could be indicated in a string vector with the name of each one. ['other', 'other_1']

CodePudding user response:

Use DataFrame.loc with set misisng value by conditions:

cols = ['other','other_1']
df.loc[df.country.ne(df.ctry) | df.city.ne(df.cty), cols] = np.nan
df = df.drop(['country', 'city'], axis=1)

Solution with remove columns country, city use DataFrame.pop:

cols = ['other','other_1']
df.loc[df.pop('country').ne(df.ctry) | df.pop('city').ne(df.cty), cols] = np.nan
print (df)
     ctry        cty other  important  other_important other_1
0  France      París  blue      19210            11119     red
1   Spain  Barcelona   NaN       1211            19210     NaN
2   Spain  Barcelona   NaN      19210             1212     NaN
3      UK     London   NaN      19210            91021     NaN

CodePudding user response:

# list of columns
cols=['other', 'other_1']

# use mask to make NaN when condition is met
df[cols] = df[cols].mask(df['country'].ne(df['ctry']) | df['city'].ne(df['cty']))

# drop columns
df = df.drop(['country', 'city'], axis=1)
df
    ctry    cty         other   important   other_important     other_1
0   France  París       blue    19210       11119               red
1   Spain   Barcelona   NaN     1211        19210               NaN
2   Spain   Barcelona   NaN     19210        1212               NaN
3   UK      London      NaN     19210       91021               fNaN
  • Related