Home > other >  Iterate through multiple columns of dataframe that contain the same substring
Iterate through multiple columns of dataframe that contain the same substring

Time:11-24

Consider the following Data Frame:

Index foo String A bar String B
First 15 nan 0 3
Second 0 2 25 nan

I have the following logic that allows me to iterate through a single column of a data frame and access a specific cell and change its value. If the value of the cell in the column with the name String A is NOT NaN then the value of the cell in the column that comes before it should be changed to NaN

for idx, val in enumerate(df['String A']):
        if not math.isnan(val) :
            df.iloc[: , df.columns.get_loc('String A')-1].iloc[idx] = np.nan

Is there a way I could generalize the loop so I could iterate through all columns that contain the substring String such that the loop applies for both columns String A and String B (or any column that contains this substring).

The table should then look like this:

Index foo String A bar String B
First 15 nan nan 3
Second nan 2 25 nan

Edit: Added more details to the question

CodePudding user response:

Try:

for idx in [i for i, c in enumerate(df.columns) if c.startswith("String")]:
    df.iloc[:, idx-1] = df.iloc[:, idx-1].where(df.iloc[:,idx].isnull())

>>> df
         foo  String A   bar  String B
Index                                 
First   15.0       NaN   NaN       3.0
Second   NaN       2.0  25.0       NaN

CodePudding user response:

In order to maintain use of your original for loop you could do the following:

for idx, val in enumerate(df): 
    if 'String' in val: 
        print(val)

CodePudding user response:

You could subset the columns and use apply to iterate:

relevant = df.columns.str.contains('String')

# select the columns that come right bevor a string column
relevant = np.where(relevant)[0] - 1

# apply your function to the relevant columns
df.iloc[:, relevant] = df.iloc[:, relevant].apply(...)
  • Related