I am trying to make new columns in a dataframe based on how many times a new value gets paired with another column.
original dataframe:
Name | Primary Address | Alternative Address |
---|---|---|
John Doe | 123 Fudge Rd. | UHP.INC |
Lee Smith | Pro Sports,LLC | |
Hank Hill | Pharm Tank.co | PodRacing.Cool |
Hank Hill | GhoulSchool,343 | |
Hank Hill | MoneyTree Rd. |
Dataframe Im trying to achieve where if there is multiple alternative addresses to one name they split out to as many columns as needed:
Name | Primary Address | Alternative Address | Alternative Address_2 | Alternative Address_3 |
---|---|---|---|---|
John Doe | 123 Fudge Rd. | UHP.INC | ||
Lee Smith | Pro Sports,LLC | |||
Hank Hill | Pharm Tank.co | PodRacing.Cool | GhoulSchool,343 | MoneyTree Rd. |
CodePudding user response:
First, create a column that contains a list of all "alternative addresses" for each person:
f = lambda g: pd.Series([g['Primary'].dropna().iloc[0],
list(g['Alternative'].dropna())],
index=['Primary', 'Alternative'])
new_df = df.groupby('Name').apply(f).reset_index()
If you need, you can then split this column into new columns:
alt_addresses = (pd.DataFrame(new_df['Alternative'].tolist())
.add_prefix('Alternative_Address_'))
result = pd.concat([new_df.drop(columns='Alternative'), alt_addresses], axis=1)
Results:
df = pd.DataFrame(
{'Name': ['John Doe', 'Lee Smith', 'Hank Hill', 'Hank Hill', 'Hank Hill'],
'Primary': ['123 Fudge Rd.', 'Pro Sports,LLC', 'Pharm Tank.co', np.nan, np.nan],
'Alternative': ['UHP.INC', None, 'PodRacing.Cool', 'GhoulSchool,343', 'MoneyTree Rd.']})
print(new_df)
Name Primary Alternative
0 Hank Hill Pharm Tank.co [PodRacing.Cool, GhoulSchool,343, MoneyTree Rd.]
1 John Doe 123 Fudge Rd. [UHP.INC]
2 Lee Smith Pro Sports,LLC []
print(result)
Name Primary Alternative_Address_0 Alternative_Address_1 Alternative_Address_2
0 Hank Hill Pharm Tank.co PodRacing.Cool GhoulSchool,343 MoneyTree Rd.
1 John Doe 123 Fudge Rd. UHP.INC None None
2 Lee Smith Pro Sports,LLC None None None