Home > database >  check if dataframe columns are not null and in a list
check if dataframe columns are not null and in a list

Time:06-08

I have a list with strings and a dataframe. I want to check that all the list items are on the dataframe columns. If they are, I want to check that the column is not empty.

for example: I have this list

pred=["col1", "col2"]

and these dataframes

df1=pd.DataFrame({'col1':['', 'element','element2'], 'col2':[20, 30, 43]})
df2=pd.DataFrame({'aaa':['', '',''], 'col1':['one', 'two', 'three'], 'col2':[5, 8, 6]})
df3=pd.DataFrame({'col1':['', '',''], 'col2':[8, 9, 5]})
df4=pd.DataFrame({'whatever':[5,8,9], 'bbb':['','','']})

df1 has the same column name as pred and none of them are empty. This one should return True. df2 also has the same columns as pred, none of them are empty either. This one is also True. df3 has the same columns, but one is empty, this df is False. df4 has nothing in common, so it should be False.

I am doing some gymnastics to just get if the columns are the same:

f = pd.read_csv(df1)
l = []
for i in pred:
    if i in f.columns:
        l.append(i)
        
len(pred)==len(l)

or to check the name of the null columns f.columns[(f.isnull().sum()/len(f)) ==1].tolist()

But I have no clue in how to join this two ideas together.

CodePudding user response:

You could do something like this:

if the columns contain empty strings '':

res = []
for df in [df1,df2,df3,df4]:
    if (set(pred).issubset(df.columns) and df[pred].ne('').any().all()):
        res.append(True)
    else:
        res.append(False)
print(res)

[True, True, False, False]

if the columns contains NaN or None you have to replace ne('') with notna().

If you want it as one-liner with list comprehension:

res = [True if (set(pred).issubset(df.columns) and df[pred].notna().any().all()) else False for df in [df1,df2,df3,df4]]
  • Related