Have been trying to solve it by myself so we have the following statement:
Exercise 5.1 (code) Using the data stored in the MongoDB collection with the reviews, count
The number of distinct users who made a review (i.e. count the number of distinct values for the field reviewerID)
The number of distinct users who gave at least one bad rating (less or equal that 2 stars)
The number of distinct books which received a review (i.e. count the number of distinct values for field asin).
Here is my code:
n_users = len(db[reviews_collection].distinct("reviewerID"))
n_users_bad_rating = len(db[reviews_collection].distinct("reviewerID", {"overall": {"$lte": 2.0}}))
n_books = len(db[reviews_collection].distinct("asin"))
print(f"There are {n_users} distinct users.")
print(f"There are {n_users_bad_rating} distinct users who gave at least one bad rating (less or equal to 2 stars).")
print(f"There are {n_books} distinct books in the reviews.")
Here is the output I get:
There are 3613 distinct users. There are 0 distinct users who gave at least one bad rating (less or equal to 2 stars). There are 3807 distinct books in the reviews.
Example of what the data in the collection looks like:
{'_id': ObjectId('6252c21c2ee307d4d522af0a'),
'appreciation': 'liked',
'asin': 'B000R93D4Y',
'book_index': 0,
'helpful': [3, 3],
'overall': 5.0,
'reviewText': 'A strange world full of strange creatures, knights, and '
'beautiful maidens. The magical aspect of healing was a nice '
'touch.',
'reviewTime': '06 23, 2013',
'reviewerID': 'A195CNOUUIT4SU',
'summary': 'Great tale of dragons',
'train_val_test': 'train',
'unixReviewTime': 1371945600,
'user_index': 0}
Question: Why am I unable to use conditions? I have other exercices in the notebook where am I asked to query the database it works perfectly fine except if I try to specify conditions. When I use a loop it tells me that I have a "TypeError: string indexes must be int"
CodePudding user response:
Here is one option how to find distinct values for reviewerID having overall <=2 via aggregation framework:
db.collection.aggregate([
{
$match: {
"overall": {
"$lte": 2.0
}
}
},
{
"$group": {
"_id": "TotalUnique",
"unique": {
"$addToSet": "$reviewerID"
}
}
}
])