Home > Software engineering >  Not able to specify a condition for db[collection].distinct
Not able to specify a condition for db[collection].distinct

Time:04-10

Have been trying to solve it by myself so we have the following statement:

Exercise 5.1 (code) Using the data stored in the MongoDB collection with the reviews, count

  • The number of distinct users who made a review (i.e. count the number of distinct values for the field reviewerID)

  • The number of distinct users who gave at least one bad rating (less or equal that 2 stars)

  • The number of distinct books which received a review (i.e. count the number of distinct values for field asin).

Here is my code:

n_users = len(db[reviews_collection].distinct("reviewerID"))
n_users_bad_rating = len(db[reviews_collection].distinct("reviewerID", {"overall": {"$lte": 2.0}}))
n_books = len(db[reviews_collection].distinct("asin"))

print(f"There are {n_users} distinct users.")
print(f"There are {n_users_bad_rating} distinct users who gave at least one bad rating (less or equal to 2 stars).")
print(f"There are {n_books} distinct books in the reviews.")

Here is the output I get:

There are 3613 distinct users. There are 0 distinct users who gave at least one bad rating (less or equal to 2 stars). There are 3807 distinct books in the reviews.

Example of what the data in the collection looks like:

{'_id': ObjectId('6252c21c2ee307d4d522af0a'),
 'appreciation': 'liked',
 'asin': 'B000R93D4Y',
 'book_index': 0,
 'helpful': [3, 3],
 'overall': 5.0,
 'reviewText': 'A strange world full of strange creatures, knights, and '
               'beautiful maidens.  The magical aspect of healing was a nice '
               'touch.',
 'reviewTime': '06 23, 2013',
 'reviewerID': 'A195CNOUUIT4SU',
 'summary': 'Great tale of dragons',
 'train_val_test': 'train',
 'unixReviewTime': 1371945600,
 'user_index': 0}

Question: Why am I unable to use conditions? I have other exercices in the notebook where am I asked to query the database it works perfectly fine except if I try to specify conditions. When I use a loop it tells me that I have a "TypeError: string indexes must be int"

CodePudding user response:

Here is one option how to find distinct values for reviewerID having overall <=2 via aggregation framework:

db.collection.aggregate([
{
 $match: {
  "overall": {
    "$lte": 2.0
  }
 }
},
{
 "$group": {
  "_id": "TotalUnique",
  "unique": {
    "$addToSet": "$reviewerID"
  }
 }
}
])

playground

  • Related