Home > Back-end >  How to compare column values when Dtype is object array in Pandas dataframe
How to compare column values when Dtype is object array in Pandas dataframe

Time:07-19

Suppose I have a simple table with 4 columns. The last 2 rows have empty values or have been set to 'unDefined':

import pandas
# initialize data of lists.
data = {'id': [12, 45, 13, 32, 43, 38],
    'org_type': [1, 2, 3, 1, None, None], 
    'org_designation': [1, 2, 3, 4, None, None], 
    'org_type_desc': ['A', 'B', 'C', 'A', 'Undefined', 'Undefined'],
    'org_designation_desc': ['D', 'E', 'F', 'G', 'Undefined', '']
}

# Create DataFrame
df = pd.DataFrame(data)

Suppose I want to run a function that checks some of the values and replaces some of the values of certain columns, and drops other columns.

def desc_field_check_replace(dataframe):
    '''This function takes a dataframe, checks for field
    ending in _desc and a corresponding field without it. It then replaces the
    base field with the _desc value and drops the desc_field'''
    new_df = dataframe
    desc_cols = [col for col in new_df.columns if '_desc' in col]
    for desc_col in desc_cols:
        for col in new_df.columns:
            if col   '_desc' == desc_col:
                if new_df[desc_col] == 'Undefined':
                    new_df[col] = ''
                else:
                    new_df[col] = new_df[desc_col]
                new_df = new_df.drop([desc_col], axis=1)             
           
    new_df = new_df.sort_values(new_df.columns[0])
    return new_df

#Run function and create modified dataframe
df1 = desc_field_check_replace(df1)

I've tried to change this feature around, but I generally get an error like this: TypeError: Cannot perform 'ror_' with a dtyped [object] array and scalar of type [bool]. What am I doing wrong here. I just want to replace the values in one column based on the values in the '_desc' column. How can I achieve this? My desired output would look like this:

enter image description here

CodePudding user response:

When comparing a Series to a value (new_df[desc_col] == 'Undefined') you get a boolean series, which you can't use in an if statement in its entirety.

You can use a string accessor instead for replacement:

def desc_field_check_replace(dataframe):
    '''This function takes a dataframe, checks for field
    ending in _desc and a corresponding field without it. It then replaces the
    base field with the _desc value and drops the desc_field'''
    new_df = dataframe
    desc_cols = [col for col in new_df.columns if '_desc' in col]
    for desc_col in desc_cols:
        for col in new_df.columns:
            if col   '_desc' == desc_col:
                new_df[col] = new_df[desc_col].str.replace('Undefined', '')
                new_df = new_df.drop([desc_col], axis=1)             
           
    new_df = new_df.sort_values(new_df.columns[0])
    return new_df

CodePudding user response:

One of the easiest ways to replace values is through map (column-wise):

d = { 'A': 'new_A', 'B':'B', 'C':'C', 'unDefined':np.nan }
df[x] = df[x].map(d) 
  • Related