Input
data = {'ID':[1,2,3,4,5,6,7], 'Column_A':['This', 'Is', 'A','Test', '• ', '•', '•test']}
df_in = pd.dataframe(data)
Output
data = {'ID':[1,4,5,6,7], 'Column_A':['This', 'Test', '• ', '•', '•test']}
df_out = pd.dataframe(data)
Problem: I want to remove column entries from my data frame that have less than 3 characters, but exclude entries having bullet points from that logic. Using
df_in = df_in[df_in['text'].str.len()>3]
I figured out how to delete all entries with less than 3 characters in a column, however, I want to keep the bullet symbol ("•") - regardless of how many characters it has (e.g. if it is alone, has some spaces or even some text behind it)).
CodePudding user response:
import pandas as pd
data = {'ID':[1,2,3,4,5,6,7], 'Column_A':['This', 'Is', 'A','Test', '• ', '•', '•test']}
df_in = pd.DataFrame(data)
# here is the condition you are looking for
df_in = df_in[(df_in['Column_A'].str.len() > 3) | (df_in["Column_A"].str.contains("•"))]
print(df_in.head())
ID Column_A
0 1 This
3 4 Test
4 5 •
5 6 •
6 7 •test
CodePudding user response:
Maybe something like this?
df_out = df_in[(df_in['Column_A'].str.len() > 3) | (df_in['Column_A'].str.startswith('•'))]