Home > Blockchain >  Build the N columns for each id based on the other columns value in data frame
Build the N columns for each id based on the other columns value in data frame

Time:03-24

I have a data frame with three columns. The first one shows the users, the second one is date and the third one show the holiday or not. (1 is holiday and 0 is not holiday).

import pandas as pd 
df = pd.DataFrame()
df['user'] = ['a', 'a', 'b','b', 'c', 'c', 'c']
df['date'] = ['2012-01-01', '2012-01-03', '2012-01-01','2012-01-02', '2012-01-01',   '2012-01-02','2012-01-03']
df['holiday']  = [1, 0, 1, 0, 1,0, 0]

I want for each user and time, build two columns and then put them together, if a user have not a date, put 2 in that columns. For example, for user=a, we have two dates. Then, for this user I should build 1, 1, 2, 2, 0,0. Or, for user b, the rows is 1, 1, 0,0, 2,2.

For user a:

  • date 2012-01-01, we have date and since it is holiday. Then we add two columns 1, 1.
  • date 2012-01-02 : we dont have date, then we build two columns with 2,2.
  • date 2012-01-03 : we have date and its not holiday, then we build two columns with 0,0.

For user b:

  • date 2012-01-01, since it is holiday. Then we add two columns 1, 1.
  • date 2012-01-02 : we have date, then we build two columns with 0,0.
  • date 2012-01-03 : we dont have date. then we build two columns with 2,2.

For user c:

  • date 2012-01-01, since it is holiday. Then we add two columns 1, 1.
  • date 2012-01-02 : we have date, then we build two columns with 0,0.
  • date 2012-01-03 : we have date, then we build two columns with 0,0.

Here is the dataframe which I want.

enter image description here

I've tried to use this code, but it gives me an error:

out              = (df.reindex(df.index.repeat(2)).assign(col=lambda x: x.groupby('user').\
                               cumcount()).pivot_table( 'date', 'user', 'col', fill_value=2).add_prefix('val').rename_axis(index=None, columns=None))

CodePudding user response:

You can pivot, fillna, and then unstack twice:

tmp = df.pivot(index='date', columns='user', values='holiday').fillna(2).astype(int).unstack().unstack()
tmp = tmp[tmp.columns.repeat(2)]

# Formatting
tmp = tmp.set_axis(np.arange(tmp.shape[1]) 1, axis=1).add_prefix('val').reset_index()

Output:

  user  val1  val2  val3  val4  val5  val6
0    a     1     1     2     2     0     0
1    b     1     1     0     0     2     2
2    c     1     1     0     0     0     0
  • Related