I want to drop all columns whose name starts by 'var' and whose content is 'None'. Sample of my dataframe:
id var1 var2 newvar1 var3 var4 newvar2
1 x y dt None f None
Dataframe that I want:
id var1 var2 newvar1 var4 newvar2
1 x y dt f None
I want to do this for several files and I do not know how many 'var' I have in all of them. My dataframe has only one row. Here is the code that I tried:
for i in range(1,300):
df.drop(df.loc[df['var' str(i)] == 'None' ].index, inplace=True)
Error obtained:
KeyError: 'var208'
I also tried:
df.drop(df.loc[df['var' str(i) for i in range(1,300)] == 'None'].index, inplace=True)
SyntaxError: invalid syntax
Could anyone help me improve my code?
CodePudding user response:
Your error occurs because you have no column with that name.
You can use df.columns to get a list of available columns, check if the name .startswith("var")
and use df[col].isnull().all()
to check if all values are None.
import pandas as pd
df = pd.DataFrame(columns=["id", "var1", "var2", "newvar1", "var3", "var4", "newvar2"],
data=[[1, "x", "y", "dt", None, "f", None]])
df.drop([col for col in df.columns if col.startswith("var") and df[col].isnull().all()], axis=1, inplace=True)
CodePudding user response:
Let's try
out = df.drop(columns=df.filter(regex='^var').isna().all().pipe(lambda s: s.index[s]))
print(out)
id var1 var2 newvar1 var4 newvar2
0 1 x y dt f None
Step by step explanation
out = df.drop(columns=(df.filter(regex='^var') # get columns where column header starts with var
.isna() # is the value none
.all() # is the whole column none
.pipe(lambda s: s.index[s])) # get the index name where the whole column is none )