Fill in column based on values in other columns pandas-CodePudding

I'm trying to determine the value of a column based on the value of 3 other columns.

The logic is, if ColC contains the string value 'Deck', populate the new column (ColD) with the value 'Deck'. If not, check ColB for the word 'Deck'. If it's not there, check ColA. If not there, just set ColD to empty.

However, I then want to repeat the same process, but for the word 'Beam'. Then again for the word 'Wall', then one last time with the word 'Grating'.

Here's a snippet of the data:

ColA    ColB                  ColC
Deck    DECK PENETRATION        
        DECK        
Deck            
        WALL PENETRATION        
        PRIMARY GRATING                 
Beam                          PRIMARY BEAM PENE

This is what the above should end up looking like:

ColA    ColB                 ColC                       ColD
Deck    DECK PENETRATION                                Deck
        DECK                                            Deck
Deck                                                    Deck
        WALL PENETRATION                                Wall
        PRIMARY GRATING                                 Grating
Beam                         PRIMARY BEAM PENE          Beam

There may be other words I'd like to check against ColA-ColC, so ideally the solution would involve passing a list of words to a function of some sort. The list right now would look like ['Deck', 'Beam', 'Wall', 'Grating']

Although unlikely, if the code finds that one column (e.g. ColA) value has the word 'Grating' and another column (e.g. ColB) has the word 'Deck' - it would just fill ColD with the word 'Conflict'.

I think I can do this using np.where but I'm struggling to set something up that works. For example, if the code is checking for the word 'Grating' and doesn't find it, I don't want it to overwrite ColD with nothing, where it may have already been populated with the word Beam, Deck, etc.

Edit:

So something like this almost works if I use in conjunction with df.apply setting axis = 1.

However, in the new column, it only contains values 'Deck' and 'Unset' after running it. Something is off with my logic and understanding of how apply works.

def my_function(row):
    li = ['Deck', 'Grating', 'Wall', 'Beam']
    for x in li:
        if x.lower() in row.A.lower():
            return x
        elif x.lower() in row.B.lower():
            return x
        elif x.lower() in row.C.lower():
            return x
        else:
            return 'unset' 


df['ColD'] = df.apply(my_function, axis = 1)

CodePudding user response：

Here's one approach. I changed your input so that the first row would lead to a conflict

import pandas as pd
import io

df = pd.read_csv(io.StringIO(
"""
ColA,ColB,ColC
Deck,DECK PENETRATION, BEAM
,DECK,
Deck,,
,WALL PENETRATION,
,PRIMARY GRATING,
Beam,,PRIMARY BEAM PENE
"""))

df['ColD'] = None

words = ['Deck', 'Beam', 'Wall', 'Grating']

#join the rows together as strings to search through
row_text = df.fillna('').sum(axis=1).str.lower()

for word in words:
    contains_inds = row_text.str.contains(word.lower())
    
    inds_to_fill = contains_inds & df['ColD'].isnull()
    inds_to_conflict = contains_inds & df['ColD'].notnull()
    
    df.loc[inds_to_fill,'ColD'] = word
    df.loc[inds_to_conflict,'ColD'] = 'Conflict'
    
df