How to add np.nan if Regex condition match in Pandas-CodePudding

I have the following dataframe:

d_test = {
    'latitude' : [40.765004, 45.504641, 40.00000, -34.283007],
    'longitude' : [-73.965961, 122.000000, -90.121679, -118.439891],
    'test': [1,2,3,4],
}
df_test = pd.DataFrame(d_test)

I want to use Regex r'^[ -]?[0-9]*[.][0]*?$ to put np.nan if condition matches. I know how to filter result completely using regex:

df_test[~df_test[['latitude', 'longitude']].apply(lambda x: x.astype(str).str.contains(r'^[ -]?[0-9]*[.][0]*?$', regex=True)).any(axis=1)]

the code above gives:

     latitude   longitude     test
0   40.765004   -73.965961     1
3   -34.283007  -118.439891    4

but I need the following result:

     latitude   longitude    test
0   40.765004   -73.965961    1
1   NaN          NaN          2
2   NaN          NaN          3
3   -34.283007  -118.439891   4

I tried to use pandas' where:

~df_[['latitude', 'longitude']].where(lambda x: x.astype(str).str.contains(r'^[ -]?[0-9]*[.][0]*?$', regex=True))

but it does not work (AttributeError: 'DataFrame' object has no attribute 'str')

CodePudding user response：

You just need to use boolean indexing with pandas.DataFrame.loc :

m = df_test.apply(lambda x: x.astype(str).str.contains(r'^[ -]?[0-9]*[.][0]*?$', regex=True)).any(axis=1)

df_test.loc[m, ["latitude", "longitude"]] = np.NaN

# Output :

print(df_test)
    latitude   longitude  test
0  40.765004  -73.965961     1
1        NaN         NaN     2
2        NaN         NaN     3
3 -34.283007 -118.439891     4

CodePudding user response：

You could just assign it back to the columns

df_test[['latitude', 'longitude']] = df_test[~df_test[['latitude', 'longitude']].apply(lambda x: x.astype(str).str.contains(r'^[ -]?[0-9]*[.][0]*?$', regex=True)).any(axis=1)]

Output

    latitude   longitude
0  40.765004  -73.965961
1        NaN         NaN
2        NaN         NaN
3 -34.283007 -118.439891

CodePudding user response：

make your condintion to cond1

cond1 = ~df_test[['latitude', 'longitude']].apply(lambda x: x.astype(str).str.contains(r'^[ -]?[0-9]*[.][0]*?$', regex=True)).any(axis=1)

use pandas where

df_test.where(cond1)

output:

   latitude   longitude
0  40.765004  -73.965961
1        NaN         NaN
2        NaN         NaN
3 -34.283007 -118.439891