Home > Blockchain >  Wide to long format, change column names to match given solution
Wide to long format, change column names to match given solution

Time:11-01

Given this DataFrame:

data = [['Tom', 'run', '2022-01-26', 'run', '2027-01-26'], ['Max', 'stop', '2020-11-16', 'run', '2022-04-26'], ['Bob', 'run', '2021-10-03', 'stop', '2022-01-26'], ['Ben', 'run', '2020-03-11', 'stop', '2013-01-26'], ['Eva', 'stop', '2017-11-16', 'run', '2015-01-26']]
df = pd.DataFrame(data, columns=['Patient', '1_Action', '1_Time', '2_Action', '2_Time'])


  Patient 1_Action      1_Time 2_Action      2_Time
0    Tom      run  2022-01-26      run  2027-01-26
1    Max     stop  2020-11-16      run  2022-04-26
2    Bob      run  2021-10-03     stop  2022-01-26
3    Ben      run  2020-03-11     stop  2013-01-26
4    Eva     stop  2017-11-16      run  2015-01-26

I want it to look like:

  Patient action        time
0    Tom    run  2022-01-26
1    Max   stop  2020-11-16
2    Bob    run  2021-10-03
3    Ben    run  2020-03-11
4    Eva   stop  2017-11-16
5    Tom    run  2027-01-26
6    Max    run  2022-04-26
7    Bob   stop  2022-01-26
8    Ben   stop  2013-01-26
9    Eva    run  2015-01-26

There is a really good answer from @BeRT2me, including the wide_to_long format:

df = pd.wide_to_long(df, 
                     stubnames=['action', 'time'],
                     i='patient',
                     j='num',
                     sep='_').reset_index()

But my column names don't match the solution, they should be: action_1, but they are 1_Action.

The Error message I get is: Value error: the id variables need to be uniquely identify each row.

Is there a way to rename my column names, that they are matching the solution, (Dataframe has about 45 columns so the manual renaming is not an option), or can I change the solution code to work with my given column names?

CodePudding user response:

We need to edit the column name in order to pass them as stubnames in wide_to_long

df.columns = df.columns.str.split('_').map(lambda x : '_'.join(x[::-1]))
#Index(['Patient', 'Action_1', 'Time_1', 'Action_2', 'Time_2'], dtype='object')
df = pd.wide_to_long(df,
                     stubnames=['Action', 'Time'],
                     i='Patient',
                     j='num',
                     sep='_',suffix = '\w ').reset_index()

CodePudding user response:

one option to transform the second dataframe is with pivot_longer from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor

df.pivot_longer(index = 'Patient', 
                names_to = '.value', 
                names_pattern = r"\d_(. )") 
  Patient Action        Time
0     Tom    run  2022-01-26
1     Max   stop  2020-11-16
2     Bob    run  2021-10-03
3     Ben    run  2020-03-11
4     Eva   stop  2017-11-16
5     Tom    run  2027-01-26
6     Max    run  2022-04-26
7     Bob   stop  2022-01-26
8     Ben   stop  2013-01-26
9     Eva    run  2015-01-26

The .value determines which parts of the columns remain as headers. .value is aligned with the parenthesised group in the regex

You could use names_sep instead of names_pattern:

(df
.pivot_longer(
    index = 'Patient', 
    names_to = ('num', '.value'), 
    names_sep='_')
.drop(columns='num')
)
  Patient Action        Time
0     Tom    run  2022-01-26
1     Max   stop  2020-11-16
2     Bob    run  2021-10-03
3     Ben    run  2020-03-11
4     Eva   stop  2017-11-16
5     Tom    run  2027-01-26
6     Max    run  2022-04-26
7     Bob   stop  2022-01-26
8     Ben   stop  2013-01-26
9     Eva    run  2015-01-26

Another option is via pd.stack, which is equally efficient, and usually more efficient than pd.wide_to_long :

temp = df.set_index('Patient')
temp.columns = temp.columns.str.split('_', expand = True)
temp.stack(0).droplevel(-1).reset_index() 
  Patient Action        Time
0     Tom    run  2022-01-26
1     Tom    run  2027-01-26
2     Max   stop  2020-11-16
3     Max    run  2022-04-26
4     Bob    run  2021-10-03
5     Bob   stop  2022-01-26
6     Ben    run  2020-03-11
7     Ben   stop  2013-01-26
8     Eva   stop  2017-11-16
9     Eva    run  2015-01-26
  • Related