I have a users database. I want to separate them into two columns to have user1 and user2. The way I was solving this was to split the names into multiple columns then merge the names to have the two columns of users.
The issue I run into is some names are long and after the split. Those names take some spot on the data frame which makes it harder to merge properly.
Users |
---|
Maria Melinda Del Valle Justin Howard |
Devin Craig Jr. Michael Carter III |
Jeanne De Bordeaux Alhamdi |
After I split the user columns
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|---|---|---|
Maria | Melinda | Del | Valle | Justin | Howard | |||
Devin | Craig | Jr. | Michael | Carter | III | |||
Jeanne | De | Bordeaux | Alhamdi |
The expected result is the following
User1 | User2 |
---|---|
Maria Melinda Del valle | Justin Howard |
Devin Craig Jr. | Michael Carter III |
Jeanne De Bordeaux | Alhamdi |
CodePudding user response:
What database server are you using ? I suggest to write a store procedure, that will have two local variables with string (nvarchar) type and a new temp table. You will need to use the split columns, that you produced earlier. Each time you will find empty string in a column you will append to the new variable. In each row you will store the two local variables in the new temp table. Finally, you can select the results from temp table as the store procedure result
CodePudding user response:
You can use:
def f(sr):
m = sr.isna().cumsum().loc[lambda x: x < 2]
return sr.dropna().groupby(m).apply(' '.join)
out = df.apply(f, axis=1).rename(columns=lambda x: f'User{x 1}')
Output:
>>> out
User1 User2
0 Maria Melinda Del Valle Justin Howard
1 Devin Craig Jr. Michael Carter III
2 Jeanne De Bordeaux Alhamdi
As suggested by @Barmar, If you know where to put the blank columns in the first split, you should know how to create both columns.