Home > Software design >  Python drop columns in string range
Python drop columns in string range

Time:09-27

I want to drop all columns whose name starts by 'var' and whose content is 'None'. Sample of my dataframe:

id var1 var2 newvar1 var3 var4 newvar2
1  x    y    dt      None f    None

Dataframe that I want:

id var1 var2 newvar1 var4 newvar2
1  x    y    dt      f    None

I want to do this for several files and I do not know how many 'var' I have in all of them. My dataframe has only one row. Here is the code that I tried:

for i in range(1,300):
    df.drop(df.loc[df['var' str(i)] == 'None' ].index, inplace=True)

Error obtained:

KeyError: 'var208'

I also tried:

df.drop(df.loc[df['var' str(i) for i in range(1,300)] == 'None'].index, inplace=True)
SyntaxError: invalid syntax

Could anyone help me improve my code?

CodePudding user response:

Your error occurs because you have no column with that name. You can use df.columns to get a list of available columns, check if the name .startswith("var") and use df[col].isnull().all() to check if all values are None.

import pandas as pd

df = pd.DataFrame(columns=["id", "var1", "var2", "newvar1", "var3", "var4", "newvar2"],
                  data=[[1, "x", "y", "dt", None, "f", None]])


df.drop([col for col in df.columns if col.startswith("var") and df[col].isnull().all()], axis=1, inplace=True)

CodePudding user response:

Let's try

out = df.drop(columns=df.filter(regex='^var').isna().all().pipe(lambda s: s.index[s]))
print(out)

   id var1 var2 newvar1 var4 newvar2
0   1    x    y      dt    f    None

Step by step explanation

out = df.drop(columns=(df.filter(regex='^var')       # get columns where column header starts with var
                       .isna()                       # is the value none
                       .all()                        # is the whole column none
                       .pipe(lambda s: s.index[s]))  # get the index name where the whole column is none )
  • Related