concatenate values in dataframe if a column has specific values and None or Null values-CodePudding

I have a dataframe with name address/email information based on the type. Based on a type I want to concat name address or name email into a new column (concat_name) within the dataframe. Some of the types are null and are causing ambiguity errors. Identifying the nulls correctly in place is where I'm having trouble.

NULL = None
data = {
    'Type': [NULL, 'MasterCard', 'Visa','Amex'],
    'Name': ['Chris','John','Jill','Mary'],
    'City': ['Tustin','Cleveland',NULL,NULL ],
    'Email': [NULL,NULL,'[email protected]','[email protected]']
}

df_data = pd.DataFrame(data)

#Expected resulting df column:
df_data['concat_name'] = ['ChrisTustin', 'JohnCleveland','[email protected],'[email protected]']

Attempt one using booleans

if df_data['Type'].isnull() | df_data[df_data['Type'] == 'Mastercard':
   df_data['concat_name'] = df_data['Name'] df_data['City']
if df_data[df_data['Type'] == 'Visa' | df_data[df_data['Type'] == 'Amex':
   df_data['concat_name'] = df_data['Name'] df_data['Email']
else:
   df_data['concat_name'] = 'Error'

Error

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().```
**Attempt two using np.where**

df_data['concat_name'] = np.where((df_data['Type'].isna()|(df_data['Type']=='MasterCard'),df_data['Name'] df_data['City'], np.where((df_data['Type']=="Visa")|(df_data['Type]=="Amex"),df_data['Name'] df_data['Email'], 'Error'

Error

ValueError: Length of values(2) does not match length of index(12000)

CodePudding user response：

Does the following code solve your use case?

# == Imports needed ===========================
import pandas as pd
import numpy as np


# == Example Dataframe =========================
df_data = pd.DataFrame(
    {
        "Type": [None, "MasterCard", "Visa", "Amex"],
        "Name": ["Chris", "John", "Jill", "Mary"],
        "City": ["Tustin", "Cleveland", None, None],
        "Email": [None, None, "[email protected]", "[email protected]"],
        # Expected output:
        "concat_name": [
            "ChrisTustin",
            "JohnCleveland",
            "[email protected]",
            "[email protected]",
        ],
    }
)

# == Solution Implementation ====================
df_data["concat_name2"] = np.where(
    (df_data["Type"].isin(["MasterCard", pd.NA, None])),
    df_data["Name"].astype(str).replace("None", "")
      df_data["City"].astype(str).replace("None", ""),
    np.where(
        (df_data["Type"].isin(["Visa", "Amex"])),
        df_data["Name"].astype(str).replace("None", "")
          df_data["Email"].astype(str).replace("None", ""),
        "Error",
    ),
)
# == Expected Output ============================
print(df_data)
# Prints:
#          Type   Name       City           Email         concat_name          concat_name2
# 0        None  Chris     Tustin            None         ChrisTustin           ChrisTustin
# 1  MasterCard   John  Cleveland            None       JohnCleveland         JohnCleveland
# 2        Visa   Jill       None  [email protected]  [email protected]    [email protected]
# 3        Amex   Mary       None    [email protected]    [email protected]      [email protected]

Notes

You might also consider simplifying the problem, by replacing the first condition (Type == 'MasterCard' or None) with the opposite of your second condition (Type == 'Visa' or 'Amex'):

df_data["concat_name2"] = np.where(
    (~df_data["Type"].isin(["Visa", "Amex"])),
    df_data["Name"].astype(str).replace("None", "")
      df_data["City"].astype(str).replace("None", ""),
    df_data["Name"].astype(str).replace("None", "")
      df_data["Email"].astype(str).replace("None", "")
)

Additionally, if you are dealing with messy data, you can also improve the implementation by converting the Type column to lowercase, or uppercase. This makes your code also account for cases where you have values like "mastercard", or "Mastercard", etc.:

df_data["concat_name2"] = np.where(
    (df_data["Type"].astype(str).str.lower().isin(["mastercard", pd.NA, None, "none"])),
    df_data["Name"].astype(str).replace("None", "")
      df_data["City"].astype(str).replace("None", ""),
    np.where(
        (df_data["Type"].astype(str).str.lower().isin(["visa", "amex"])),
        df_data["Name"].astype(str).replace("None", "")
          df_data["Email"].astype(str).replace("None", ""),
        "Error",
    ),
)