I have the following dataframe:
d_test = {
'latitude' : [40.765004, 45.504641, 40.00000, -34.283007],
'longitude' : [-73.965961, 122.000000, -90.121679, -118.439891],
'test': [1,2,3,4],
}
df_test = pd.DataFrame(d_test)
I want to use Regex r'^[ -]?[0-9]*[.][0]*?$
to put np.nan
if condition matches.
I know how to filter result completely using regex:
df_test[~df_test[['latitude', 'longitude']].apply(lambda x: x.astype(str).str.contains(r'^[ -]?[0-9]*[.][0]*?$', regex=True)).any(axis=1)]
the code above gives:
latitude longitude test
0 40.765004 -73.965961 1
3 -34.283007 -118.439891 4
but I need the following result:
latitude longitude test
0 40.765004 -73.965961 1
1 NaN NaN 2
2 NaN NaN 3
3 -34.283007 -118.439891 4
I tried to use pandas' where
:
~df_[['latitude', 'longitude']].where(lambda x: x.astype(str).str.contains(r'^[ -]?[0-9]*[.][0]*?$', regex=True))
but it does not work (AttributeError: 'DataFrame' object has no attribute 'str'
)
CodePudding user response:
You just need to use boolean indexing with pandas.DataFrame.loc
:
m = df_test.apply(lambda x: x.astype(str).str.contains(r'^[ -]?[0-9]*[.][0]*?$', regex=True)).any(axis=1)
df_test.loc[m, ["latitude", "longitude"]] = np.NaN
# Output :
print(df_test)
latitude longitude test
0 40.765004 -73.965961 1
1 NaN NaN 2
2 NaN NaN 3
3 -34.283007 -118.439891 4
CodePudding user response:
You could just assign it back to the columns
df_test[['latitude', 'longitude']] = df_test[~df_test[['latitude', 'longitude']].apply(lambda x: x.astype(str).str.contains(r'^[ -]?[0-9]*[.][0]*?$', regex=True)).any(axis=1)]
Output
latitude longitude
0 40.765004 -73.965961
1 NaN NaN
2 NaN NaN
3 -34.283007 -118.439891
CodePudding user response:
make your condintion to cond1
cond1 = ~df_test[['latitude', 'longitude']].apply(lambda x: x.astype(str).str.contains(r'^[ -]?[0-9]*[.][0]*?$', regex=True)).any(axis=1)
use pandas where
df_test.where(cond1)
output:
latitude longitude
0 40.765004 -73.965961
1 NaN NaN
2 NaN NaN
3 -34.283007 -118.439891