I have a dataframe with name address/email information based on the type. Based on a type I want to concat name address or name email into a new column (concat_name) within the dataframe. Some of the types are null and are causing ambiguity errors. Identifying the nulls correctly in place is where I'm having trouble.
NULL = None
data = {
'Type': [NULL, 'MasterCard', 'Visa','Amex'],
'Name': ['Chris','John','Jill','Mary'],
'City': ['Tustin','Cleveland',NULL,NULL ],
'Email': [NULL,NULL,'[email protected]','[email protected]']
}
df_data = pd.DataFrame(data)
#Expected resulting df column:
df_data['concat_name'] = ['ChrisTustin', 'JohnCleveland','[email protected],'[email protected]']
Attempt one using booleans
if df_data['Type'].isnull() | df_data[df_data['Type'] == 'Mastercard':
df_data['concat_name'] = df_data['Name'] df_data['City']
if df_data[df_data['Type'] == 'Visa' | df_data[df_data['Type'] == 'Amex':
df_data['concat_name'] = df_data['Name'] df_data['Email']
else:
df_data['concat_name'] = 'Error'
Error
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().```
**Attempt two using np.where**
df_data['concat_name'] = np.where((df_data['Type'].isna()|(df_data['Type']=='MasterCard'),df_data['Name'] df_data['City'], np.where((df_data['Type']=="Visa")|(df_data['Type]=="Amex"),df_data['Name'] df_data['Email'], 'Error'
Error
ValueError: Length of values(2) does not match length of index(12000)
CodePudding user response:
Does the following code solve your use case?
# == Imports needed ===========================
import pandas as pd
import numpy as np
# == Example Dataframe =========================
df_data = pd.DataFrame(
{
"Type": [None, "MasterCard", "Visa", "Amex"],
"Name": ["Chris", "John", "Jill", "Mary"],
"City": ["Tustin", "Cleveland", None, None],
"Email": [None, None, "[email protected]", "[email protected]"],
# Expected output:
"concat_name": [
"ChrisTustin",
"JohnCleveland",
"[email protected]",
"[email protected]",
],
}
)
# == Solution Implementation ====================
df_data["concat_name2"] = np.where(
(df_data["Type"].isin(["MasterCard", pd.NA, None])),
df_data["Name"].astype(str).replace("None", "")
df_data["City"].astype(str).replace("None", ""),
np.where(
(df_data["Type"].isin(["Visa", "Amex"])),
df_data["Name"].astype(str).replace("None", "")
df_data["Email"].astype(str).replace("None", ""),
"Error",
),
)
# == Expected Output ============================
print(df_data)
# Prints:
# Type Name City Email concat_name concat_name2
# 0 None Chris Tustin None ChrisTustin ChrisTustin
# 1 MasterCard John Cleveland None JohnCleveland JohnCleveland
# 2 Visa Jill None [email protected] [email protected] [email protected]
# 3 Amex Mary None [email protected] [email protected] [email protected]
Notes
You might also consider simplifying the problem, by replacing the first condition (Type == 'MasterCard' or None
) with the opposite of your second condition (Type == 'Visa' or 'Amex'
):
df_data["concat_name2"] = np.where(
(~df_data["Type"].isin(["Visa", "Amex"])),
df_data["Name"].astype(str).replace("None", "")
df_data["City"].astype(str).replace("None", ""),
df_data["Name"].astype(str).replace("None", "")
df_data["Email"].astype(str).replace("None", "")
)
Additionally, if you are dealing with messy data, you can also improve the implementation by converting the Type
column to lowercase, or uppercase. This makes your code also account for cases where you have values like "mastercard", or "Mastercard", etc.
:
df_data["concat_name2"] = np.where(
(df_data["Type"].astype(str).str.lower().isin(["mastercard", pd.NA, None, "none"])),
df_data["Name"].astype(str).replace("None", "")
df_data["City"].astype(str).replace("None", ""),
np.where(
(df_data["Type"].astype(str).str.lower().isin(["visa", "amex"])),
df_data["Name"].astype(str).replace("None", "")
df_data["Email"].astype(str).replace("None", ""),
"Error",
),
)