I have two columns, phones and emails, that need to be exploded into rows. I have figured out how to do this to either one, but not both simultaneously. The biggest problem is that I may have 0 to many phones and 0 to many emails. So, if a customer has three emails and no phones, then I need 3 rows. If they have four phones and three emails, then I need 4 rows. One for each phone, and the three emails in those four rows. Example data:
| many columns | phones | emails |
|:-------------|:------:|:-------|
| row 1 | A,B,C | A,B |
| row 2 | | D,E,F |
Example Results:
| many columns | phones | emails |
|:-------------|:------:|:-------|
| row 1 | A | A |
| row 1 | B | B |
| row 1 | C | |
| row 2 | | D |
| row 2 | | E |
| row 2 | | F |
# Convert cell contents into lists rather than strings
df0['phones'] = df0['phones'].str.split(";", expand=False)
df0['emails'] = df0['emails'].str.split(",", expand=False)
df0 = df0.apply(pd.Series.explode) # DOES NOT WORK
When I try the above code, I get the error:
ValueError: cannot reindex on an axis with duplicate labels
CodePudding user response:
I assume the index on your original dataframe is unique. If not, run df = df.reset_index()
before the following snippet:
columns = ["phones", "emails"]
# Explode each column individually, but instead of using `explode`, we will
# use`stack` to give us a second index level
exploded = [
df[col].str.split(",", expand=True).stack().rename(col)
for col in columns
]
# Align the exploded columns
exploded = pd.concat(exploded, axis=1).droplevel(-1)
# Merge it with the original data frame
result = pd.concat([df.drop(columns=columns), exploded], axis=1)