We're using AWS Personalize to get a personalized ranking of various items in our feed for a specific user.
We are also using a filter that looks like
EXCLUDE ItemID WHERE Interactions.event_type IN ("*")
This filter is taken from an AWS blog that states
To remove all items that a user has previously interacted with, use the following filter expression:
EXCLUDE itemId WHERE INTERACTIONS.event_type in ("*")
Now playing with the console https://console.aws.amazon.com/personalize/home?region=us-east-1#arn:aws:personalize:us-east-1::dataset-group$<dataset_name>/campaigns/campaignDetail/<campaign_arn>
I input a userId=5253ffbb-f5e3-4e71-9a33-91ee65365c7d
and a bunch of item ids:
5829, 5480, 2275, 6706, 5438, 6444, 6444, 7461, 7599, 4384, 6747, 7499, 6491, 5453, 7605, 5985, 6663, 7174, 1094, 6474, 7357, 7220, 8370, 7445, 5721, 991, 5592, 9283, 7547, 8676, 8872, 8092, 9401, 8645, 2090, 7684, 3788, 5849, 6524, 8480, 7299, 5752, 8007, 9100, 7422, 8640, 7917, 9254, 10050, 9851, 1744, 4227, 6388, 9490, 6481, 5744, 6486, 9040, 4048, 8170, 9623, 7966, 8560, 5336, 3885, 4441, 10442, 6842, 4898, 567, 4214, 125, 9556, 10039, 5494, 9447, 10051, 8302, 9482, 6649, 9133, 4828, 8288, 62, 9680, 4792, 10785, 9727, 10777, 11366, 10252, 9728, 2450, 10463, 9578, 4246, 10154, 10793, 10299, 6733, 10597, vy7erddv, 9247, 9816, 8385, 9589, 10845, 10368, 11427, 11405, 10475, 11273, 11392, 11335, 5871, 10465, 10927, 9371, 9894, 10773, 10747, 11274, 11349, 10831, 9882, vaxq362m, m3g32ayv, 5wqa8r4v, km7kl7kv, 3wno92pm, 3m483l5v, pv9rallv, lmr4dn8v
Now I log interactions of this user with some items and reload the console recommendations...
This appears to work as expected and the items get filtered from the list if the user already interacted with them.
BUT to my surprise...these items DO NOT remain filtered indefinitely...If I continue logging interactions against other items for this user, then later re-loads of recommendations might feature items that were previously interacted with. Or given enough time (like a day) all of the items seem to just come back for this user!!
I'm totally lost as to figure out why it is behaving like this.
The interactions get tracked as
POST https://personalize-events.us-east-1.amazonaws.com/events
{
"eventList": [
{
"eventType": "list_view",
"ITEM_ID": "vaxq362m",
"properties": "{\"itemType\": \"artwork\", \"itemId\": \"vaxq362m\"}",
"sentAt": {{$timestamp}}
}
],
"sessionId": "xxx1234",
"trackingId": "<OUR_TRACKING_ID>",
"userId": "5253ffbb-f5e3-4e71-9a33-91ee65365c7d"
}
And this seems to work because
- The status of the response is 200
- The interaction appears in the CSV if I export the interactions dataset
- The items DO get removed INITIALLY for a short time from the recommendations that come back
CodePudding user response:
Filtering on the interactions dataset does not take the full history for the user into account. From the docs:
Amazon Personalize considers up to 200 historical interactions for a user, and up to 100 streamed interactions you record for the user with the PutEvents operation. Additionally, the number of historical interactions Amazon Personalize considers for a user depends on the max_user_history_length_percentile and min_user_history_length_percentile hyperparameters you defined before training.
For example, if you used .99 for the max_user_history_length_percentile, and 99% of your users have at most 4 interactions, Amazon Personalize will only filter based on the user's most recent 4 historical interactions. If a user has less than the number historical interactions at the min_user_history_length_percentile, Amazon Personalize doesn't consider the user's interactions when filtering.
To filter based on up to 200 historical interactions for a user, set the max_user_history_length_percentile to 1.0 and retrain the model.