Home > other >  Reformat the dataframe
Reformat the dataframe

Time:12-27

I need to reformat the dataframe to be framed with users_id as rows and website_id as columns.

Each user must appear only once on each line and each website_id must only appear in one column.

| website_id | url               | user_id|
|------------|-------------------|--------|
|123         |www.google.com     |       1|
|234         |www.flamengo.com.br|       3|
|123         |www.google.com     |       4|
|234         |www.flamengo.com.br|       1|
|345         |www.nasa.gov       |      34|

if the user has accessed the website_id I need to fill the 'new' column with 1, otherwise 0.

I don't know where to start to reach this goal. Final result:

|user_id|123|234|345|
|-------|---|---|---|
|1      |1  |1  |0  |
|3      |0  |1  |0  |
|4      |1  |0  |0  |
|34     |0  |0  |1  |

CodePudding user response:

Example

data = {'website_id': {0: 123, 1: 234, 2: 123, 3: 234, 4: 345},
        'url': {0: 'www.google.com',
                1: 'www.flamengo.com.br',
                2: 'www.google.com',
                3: 'www.flamengo.com.br',
                4: 'www.nasa.gov'},
        'user_id': {0: 1, 1: 3, 2: 4, 3: 1, 4: 34}}
df = pd.DataFrame(data)

Code

out = pd.crosstab(df['user_id'], df['website_id'])

out

        123 234 345
user_id         
1       1   1   0
3       0   1   0
4       1   0   0
34      0   0   1

CodePudding user response:

IIUC, you can also try this:

df.set_index('user_id')['website_id'].astype(str)\
  .str.get_dummies().groupby(level=0).sum().reset_index()

Output:

   user_id  123  234  345
0        1    1    1    0
1        3    0    1    0
2        4    1    0    0
3       34    0    0    1
  • Related