Home > Mobile >  Make a new column in a dataframe for every new instance based off another column
Make a new column in a dataframe for every new instance based off another column

Time:08-19

I am trying to make new columns in a dataframe based on how many times a new value gets paired with another column.

original dataframe:

Name Primary Address Alternative Address
John Doe 123 Fudge Rd. UHP.INC
Lee Smith Pro Sports,LLC
Hank Hill Pharm Tank.co PodRacing.Cool
Hank Hill GhoulSchool,343
Hank Hill MoneyTree Rd.

Dataframe Im trying to achieve where if there is multiple alternative addresses to one name they split out to as many columns as needed:

Name Primary Address Alternative Address Alternative Address_2 Alternative Address_3
John Doe 123 Fudge Rd. UHP.INC
Lee Smith Pro Sports,LLC
Hank Hill Pharm Tank.co PodRacing.Cool GhoulSchool,343 MoneyTree Rd.

CodePudding user response:

First, create a column that contains a list of all "alternative addresses" for each person:

f = lambda g: pd.Series([g['Primary'].dropna().iloc[0],
                         list(g['Alternative'].dropna())], 
                        index=['Primary', 'Alternative'])

new_df = df.groupby('Name').apply(f).reset_index()

If you need, you can then split this column into new columns:

alt_addresses = (pd.DataFrame(new_df['Alternative'].tolist())
                 .add_prefix('Alternative_Address_'))
result = pd.concat([new_df.drop(columns='Alternative'), alt_addresses], axis=1)

Results:

df = pd.DataFrame(
    {'Name': ['John Doe', 'Lee Smith', 'Hank Hill', 'Hank Hill', 'Hank Hill'], 
     'Primary': ['123 Fudge Rd.', 'Pro Sports,LLC', 'Pharm Tank.co', np.nan, np.nan], 
     'Alternative': ['UHP.INC', None, 'PodRacing.Cool', 'GhoulSchool,343', 'MoneyTree Rd.']})
print(new_df)

        Name         Primary                                       Alternative
0  Hank Hill   Pharm Tank.co  [PodRacing.Cool, GhoulSchool,343, MoneyTree Rd.]
1   John Doe   123 Fudge Rd.                                         [UHP.INC]
2  Lee Smith  Pro Sports,LLC                                                []
print(result)

        Name         Primary  Alternative_Address_0  Alternative_Address_1  Alternative_Address_2
0  Hank Hill   Pharm Tank.co         PodRacing.Cool        GhoulSchool,343          MoneyTree Rd.
1   John Doe   123 Fudge Rd.                UHP.INC                   None                   None   
2  Lee Smith  Pro Sports,LLC                   None                   None                   None   
  • Related