I have DataFrame in Python Pandas like below (data type of col1 is string):
col1
-------
1112224
222
111
123DDT
985T
NaN
And I need to select only these rows where:
- number of characters different from 3
or
- contains at least 1 letter
or
- no data (NaN)
So as a result I need something like below:
col1
-------
123DDT
985T
NaN
How can I do that in Python Pandas ?
CodePudding user response:
You can easily use masks and boolean indexing:
# not 3 chars
m1 = df['col1'].str.len().ne(3)
# at least one letter
m2 = df['col1'].str.contains('[a-zA-Z]')
# or NaN
m3 = df['col1'].isna()
# any of the above conditions
out = df[m1|m2|m3]
NB. the condition 3 is already included in the first one (NaN has not 3 chars*), so you could skip it.
output:
col1
0 1112224
3 123DDT
4 985T
5 NaN
* although, ironically, the name "NaN" does ;)