Be the following DataFrame in pandas.
country | ctry | city | cty | other | important | other_important | other_1 |
---|---|---|---|---|---|---|---|
France | France | París | París | blue | 019210 | 0011119 | red |
Spain | Spain | Madrid | Barcelona | blue | 1211 | 0019210 | blue |
Germany | Spain | Barcelona | Barcelona | white | 019210 | 1212 | red |
France | UK | Bourdeux | London | blue | 019210 | 91021 | red |
I have to fill with NaN the information of the unimportant columns (other) in case country != ctry || city != cty
. Dataframe result:
country | ctry | city | cty | other | important | other_important | other_1 |
---|---|---|---|---|---|---|---|
France | France | París | París | blue | 019210 | 0011119 | red |
Spain | Spain | Madrid | Barcelona | NaN | 1211 | 0019210 | NaN |
Germany | Spain | Barcelona | Barcelona | NaN | 019210 | 1212 | NaN |
France | UK | Bourdeux | London | NaN | 019210 | 91021 | NaN |
Finally I delete the country and city columns.
df = df.drop(['country', 'city'], axis=1)
ctry | cty | other | important | other_important | other_1 |
---|---|---|---|---|---|
France | París | blue | 019210 | 0011119 | red |
Spain | Barcelona | NaN | 1211 | 0019210 | NaN |
Spain | Barcelona | NaN | 019210 | 1212 | NaN |
UK | London | NaN | 019210 | 91021 | NaN |
I would be grateful if the columns that I want to leave as NaN, could be indicated in a string vector with the name of each one. ['other', 'other_1']
CodePudding user response:
Use DataFrame.loc
with set misisng value by conditions:
cols = ['other','other_1']
df.loc[df.country.ne(df.ctry) | df.city.ne(df.cty), cols] = np.nan
df = df.drop(['country', 'city'], axis=1)
Solution with remove columns country, city
use DataFrame.pop
:
cols = ['other','other_1']
df.loc[df.pop('country').ne(df.ctry) | df.pop('city').ne(df.cty), cols] = np.nan
print (df)
ctry cty other important other_important other_1
0 France París blue 19210 11119 red
1 Spain Barcelona NaN 1211 19210 NaN
2 Spain Barcelona NaN 19210 1212 NaN
3 UK London NaN 19210 91021 NaN
CodePudding user response:
# list of columns
cols=['other', 'other_1']
# use mask to make NaN when condition is met
df[cols] = df[cols].mask(df['country'].ne(df['ctry']) | df['city'].ne(df['cty']))
# drop columns
df = df.drop(['country', 'city'], axis=1)
df
ctry cty other important other_important other_1
0 France París blue 19210 11119 red
1 Spain Barcelona NaN 1211 19210 NaN
2 Spain Barcelona NaN 19210 1212 NaN
3 UK London NaN 19210 91021 fNaN