Home > OS >  How to select rows where number of characters different from 3 or contains at least 1 letter or no d
How to select rows where number of characters different from 3 or contains at least 1 letter or no d

Time:07-07

I have DataFrame in Python Pandas like below (data type of col1 is string):

col1
-------
1112224
222
111
123DDT
985T
NaN

And I need to select only these rows where:

  • number of characters different from 3

or

  • contains at least 1 letter

or

  • no data (NaN)

So as a result I need something like below:

col1
-------
123DDT
985T
NaN

How can I do that in Python Pandas ?

CodePudding user response:

You can easily use masks and boolean indexing:

# not 3 chars
m1 = df['col1'].str.len().ne(3)
# at least one letter
m2 = df['col1'].str.contains('[a-zA-Z]')
# or NaN
m3 = df['col1'].isna()

# any of the above conditions
out = df[m1|m2|m3]

NB. the condition 3 is already included in the first one (NaN has not 3 chars*), so you could skip it.

output:

      col1
0  1112224
3   123DDT
4     985T
5      NaN

* although, ironically, the name "NaN" does ;)

  • Related