Suppose I have a simple table with 4 columns. The last 2 rows have empty values or have been set to 'unDefined':
import pandas
# initialize data of lists.
data = {'id': [12, 45, 13, 32, 43, 38],
'org_type': [1, 2, 3, 1, None, None],
'org_designation': [1, 2, 3, 4, None, None],
'org_type_desc': ['A', 'B', 'C', 'A', 'Undefined', 'Undefined'],
'org_designation_desc': ['D', 'E', 'F', 'G', 'Undefined', '']
}
# Create DataFrame
df = pd.DataFrame(data)
Suppose I want to run a function that checks some of the values and replaces some of the values of certain columns, and drops other columns.
def desc_field_check_replace(dataframe):
'''This function takes a dataframe, checks for field
ending in _desc and a corresponding field without it. It then replaces the
base field with the _desc value and drops the desc_field'''
new_df = dataframe
desc_cols = [col for col in new_df.columns if '_desc' in col]
for desc_col in desc_cols:
for col in new_df.columns:
if col '_desc' == desc_col:
if new_df[desc_col] == 'Undefined':
new_df[col] = ''
else:
new_df[col] = new_df[desc_col]
new_df = new_df.drop([desc_col], axis=1)
new_df = new_df.sort_values(new_df.columns[0])
return new_df
#Run function and create modified dataframe
df1 = desc_field_check_replace(df1)
I've tried to change this feature around, but I generally get an error like this:
TypeError: Cannot perform 'ror_' with a dtyped [object] array and scalar of type [bool]
. What am I doing wrong here. I just want to replace the values in one column based on the values in the '_desc' column. How can I achieve this? My desired output would look like this:
CodePudding user response:
When comparing a Series to a value (new_df[desc_col] == 'Undefined'
) you get a boolean series, which you can't use in an if statement in its entirety.
You can use a string accessor instead for replacement:
def desc_field_check_replace(dataframe):
'''This function takes a dataframe, checks for field
ending in _desc and a corresponding field without it. It then replaces the
base field with the _desc value and drops the desc_field'''
new_df = dataframe
desc_cols = [col for col in new_df.columns if '_desc' in col]
for desc_col in desc_cols:
for col in new_df.columns:
if col '_desc' == desc_col:
new_df[col] = new_df[desc_col].str.replace('Undefined', '')
new_df = new_df.drop([desc_col], axis=1)
new_df = new_df.sort_values(new_df.columns[0])
return new_df
CodePudding user response:
One of the easiest ways to replace values is through map (column-wise):
d = { 'A': 'new_A', 'B':'B', 'C':'C', 'unDefined':np.nan }
df[x] = df[x].map(d)