Home > OS >  Keep columns if column contains string
Keep columns if column contains string

Time:06-30

This has been answered similarly at How to keep a row if any column contains a certain substring?. However, my problem involves multiple dataframes within a list which is a different set-up to the other post. Additionally, I want to keep columns rather than rows.

I have tried all alternatives to the answers in that post and cannot get my problem to successfully work. Here's what I am working with:

import pandas as pd
nm = ["Sepal.Length" ,"Sepal.Width" , "Petal.Length", "Petal.Width", "Species"]
def tbl(data):
    data = [data[x:]   data[:x] for x in range(1, len(data) 1)]
    df = pd.DataFrame(data)
    return df

df_tbl = tbl(nm)
ls_comb = [df_tbl.loc[0:i] for i in range(0, len(df_tbl))]
reply_pred=[i.apply(lambda x: x.str.replace('Species', 'log(Species)')) for i in ls_comb]

Here is what I have tried:

[i[i.apply(lambda x: x.str.contains('Sepal.Width', na=False))] for i in reply_pred]

[             0    1    2    3    4
 0  Sepal.Width  NaN  NaN  NaN  NaN,
              0    1    2    3            4
 0  Sepal.Width  NaN  NaN  NaN          NaN
 1          NaN  NaN  NaN  NaN  Sepal.Width,
              0    1    2            3            4
 0  Sepal.Width  NaN  NaN          NaN          NaN
 1          NaN  NaN  NaN          NaN  Sepal.Width
 2          NaN  NaN  NaN  Sepal.Width          NaN,
              0    1            2            3            4
 0  Sepal.Width  NaN          NaN          NaN          NaN
 1          NaN  NaN          NaN          NaN  Sepal.Width
 2          NaN  NaN          NaN  Sepal.Width          NaN
 3          NaN  NaN  Sepal.Width          NaN          NaN,
              0            1            2            3            4
 0  Sepal.Width          NaN          NaN          NaN          NaN
 1          NaN          NaN          NaN          NaN  Sepal.Width
 2          NaN          NaN          NaN  Sepal.Width          NaN
 3          NaN          NaN  Sepal.Width          NaN          NaN
 4          NaN  Sepal.Width          NaN          NaN          NaN]

However, the expected output should return the entire column, for example:

[             0   
 0  Sepal.Width  
              0    4
 0  Sepal.Width  Sepal.Length
 1  Petal.Length  Sepal.Width,
              0    3             4
 0  Sepal.Width  log(Species)  Sepal.Length
 1  Petal.Length Sepal.Length  Sepal.Width
 2  Petal.Width  Sepal.Width   Petal.Length,

       .
       .
       .

CodePudding user response:

You can booleaing mask df.columns then use df.loc to select the remaining columns

dfs = [df.loc[:, df.columns[df.apply(lambda col: col.str.contains('Sepal.Width')).any()]]
       for df in reply_pred]
for df in dfs:
    print(df, '\n')

             0
0  Sepal.Width

              0             4
0   Sepal.Width  Sepal.Length
1  Petal.Length   Sepal.Width

              0             3             4
0   Sepal.Width  log(Species)  Sepal.Length
1  Petal.Length  Sepal.Length   Sepal.Width
2   Petal.Width   Sepal.Width  Petal.Length

              0             2             3             4
0   Sepal.Width   Petal.Width  log(Species)  Sepal.Length
1  Petal.Length  log(Species)  Sepal.Length   Sepal.Width
2   Petal.Width  Sepal.Length   Sepal.Width  Petal.Length
3  log(Species)   Sepal.Width  Petal.Length   Petal.Width

              0             1             2             3             4
0   Sepal.Width  Petal.Length   Petal.Width  log(Species)  Sepal.Length
1  Petal.Length   Petal.Width  log(Species)  Sepal.Length   Sepal.Width
2   Petal.Width  log(Species)  Sepal.Length   Sepal.Width  Petal.Length
3  log(Species)  Sepal.Length   Sepal.Width  Petal.Length   Petal.Width
4  Sepal.Length   Sepal.Width  Petal.Length   Petal.Width  log(Species)
  • Related