Home > OS >  Remove all strings with less than n characters from data frame with exception (Python)
Remove all strings with less than n characters from data frame with exception (Python)

Time:06-30

Input

data = {'ID':[1,2,3,4,5,6,7], 'Column_A':['This', 'Is', 'A','Test', '•  ', '•', '•test']}
df_in = pd.dataframe(data)

Output

data = {'ID':[1,4,5,6,7], 'Column_A':['This', 'Test', '•  ', '•', '•test']}
df_out = pd.dataframe(data)

Problem: I want to remove column entries from my data frame that have less than 3 characters, but exclude entries having bullet points from that logic. Using

df_in = df_in[df_in['text'].str.len()>3]

I figured out how to delete all entries with less than 3 characters in a column, however, I want to keep the bullet symbol ("•") - regardless of how many characters it has (e.g. if it is alone, has some spaces or even some text behind it)).

CodePudding user response:

import pandas as pd

data = {'ID':[1,2,3,4,5,6,7], 'Column_A':['This', 'Is', 'A','Test', '•  ', '•', '•test']}
df_in = pd.DataFrame(data)

                                                 # here is the condition you are looking for
df_in = df_in[(df_in['Column_A'].str.len() > 3) | (df_in["Column_A"].str.contains("•"))]

print(df_in.head())
   ID Column_A
0   1     This
3   4     Test
4   5      •  
5   6        •
6   7    •test

CodePudding user response:

Maybe something like this?

df_out = df_in[(df_in['Column_A'].str.len() > 3) | (df_in['Column_A'].str.startswith('•'))]
  • Related