I need to reformat the dataframe to be framed with users_id as rows and website_id as columns.
Each user must appear only once on each line and each website_id must only appear in one column.
| website_id | url | user_id|
|------------|-------------------|--------|
|123 |www.google.com | 1|
|234 |www.flamengo.com.br| 3|
|123 |www.google.com | 4|
|234 |www.flamengo.com.br| 1|
|345 |www.nasa.gov | 34|
if the user has accessed the website_id I need to fill the 'new' column with 1, otherwise 0.
I don't know where to start to reach this goal. Final result:
|user_id|123|234|345|
|-------|---|---|---|
|1 |1 |1 |0 |
|3 |0 |1 |0 |
|4 |1 |0 |0 |
|34 |0 |0 |1 |
CodePudding user response:
Example
data = {'website_id': {0: 123, 1: 234, 2: 123, 3: 234, 4: 345},
'url': {0: 'www.google.com',
1: 'www.flamengo.com.br',
2: 'www.google.com',
3: 'www.flamengo.com.br',
4: 'www.nasa.gov'},
'user_id': {0: 1, 1: 3, 2: 4, 3: 1, 4: 34}}
df = pd.DataFrame(data)
Code
out = pd.crosstab(df['user_id'], df['website_id'])
out
123 234 345
user_id
1 1 1 0
3 0 1 0
4 1 0 0
34 0 0 1
CodePudding user response:
IIUC, you can also try this:
df.set_index('user_id')['website_id'].astype(str)\
.str.get_dummies().groupby(level=0).sum().reset_index()
Output:
user_id 123 234 345
0 1 1 1 0
1 3 0 1 0
2 4 1 0 0
3 34 0 0 1