AttributeError: 'str' object has no attribute 'numpy'


My command

Windows 11 PowerShell.

!pip install tensorflow-datasets
pip install tensorflow-datasets

# pip install tfds-nightly

import tensorflow_datasets as tfds
datasets = tfds.load("imdb_reviews")

train_set = tfds.load("imdb_reviews") # 25.000 reviews.
test_set = datasets["test"]           # 25.000 reviews.

train_set, test_set = tfds.load("imdb_reviews", split=["train", "test"])
train_set, test_set = tfds.load("imdb_reviews:1.0.0", split=["train", "test"])
train_set, test_set = tfds.load("imdb_reviews:1.0.0", split=["train", "test[:60%]"])
train_set, test_set, valid_set = tfds.load("imdb_reviews:1.0.0", split=["train", "test[:60%]", "test[60%:]"])
train_set, test_set, valid_set = tfds.load("imdb_reviews:1.0.0", split=["train", "test[:60%]", "test[60%:]"], as_supervised = True)

for review, label in train_set.take(2):
PS C:\Users\donhu> python
Python 3.9.0 (tags/v3.9.0:9cf6752, Oct  5 2020, 15:34:40) [MSC v.1927 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow_datasets as tfds
>>> datasets = tfds.load("imdb_reviews")
Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to C:\Users\donhu\tensorflow_datasets\imdb_reviews\plain_text\1.0.0...
Dl Completed...:   0%|                                                                                                              | 0/1 [00:11<?, ? url/s]
Dl Size...:  26%|███████████████████████████▌                                                                             | 21/80 [00:11<00:25,  2.27 MiB/s]
Dl Completed...:   0%|                                                                                                              | 0/1 [00:12<?, ? url/s]
Dl Size...:  28%|████████████████████████████▉
Dl Completed...:   0%|                                                                                                              | 0/1 [00:12<?, ? url/s]
Dl Size...:  29%|██████████████████████████████▏
Dl Completed...:   0%|                                                                                                              | 0/1 [00:13<?, ? url/s]
Dl Size...:  30%|███████████████████████████████▌
Dl Completed...:   0%|                                                                                                              | 0/1 [00:13<?, ? url/s]
Dl Size...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:51<00:00,  1.55 MiB/s]
Dl Completed...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:51<00:00, 51.53s/ url]
Generating splits...:   0%|                                                                                                      | 0/3 [00:00<?, ? splits/s]
Generating train examples...: 7479 examples [00:02, 6153.86 examples/s]
Dataset imdb_reviews downloaded and prepared to C:\Users\donhu\tensorflow_datasets\imdb_reviews\plain_text\1.0.0. Subsequent calls will reuse this data.
2022-12-01 19:48:16.580270: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-01 19:48:17.465275: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3994 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1660 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5
>>> train_set = tfds.load("imdb_reviews")
>>> test_set = datasets["test"]
>>> train_set, test_set = tfds.load("imdb_reviews", split=["train", "test"])
>>> for review, label in train_set.take(2):
... print(review.numpy().decode("utf-8"))
  File "<stdin>", line 2
IndentationError: expected an indented block
>>> print(review.numpy().decode("utf-8"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'review' is not defined
>>>     Windows 11 PowerShell.
  File "<stdin>", line 1
    Windows 11 PowerShell.
IndentationError: unexpected indent
>>> !pip install tensorflow-datasets
  File "<stdin>", line 1
    !pip install tensorflow-datasets
SyntaxError: invalid syntax
>>> pip install tensorflow-datasets
  File "<stdin>", line 1
    pip install tensorflow-datasets
SyntaxError: invalid syntax
>>> # pip install tfds-nightly
>>> import tensorflow_datasets as tfds
>>> datasets = tfds.load("imdb_reviews")
>>> train_set = tfds.load("imdb_reviews") # 25.000 reviews.
>>> test_set = datasets["test"]           # 25.000 reviews.
>>> train_set, test_set = tfds.load("imdb_reviews", split=["train", "test"])
>>> train_set, test_set = tfds.load("imdb_reviews:1.0.0", split=["train", "test"])
>>> train_set, test_set = tfds.load("imdb_reviews:1.0.0", split=["train", "test[:60%]"])
>>> train_set, test_set, valid_set = tfds.load("imdb_reviews:1.0.0", split=["train", "test[:60%]"], "test[60%:]")
  File "<stdin>", line 1
    train_set, test_set, valid_set = tfds.load("imdb_reviews:1.0.0", split=["train", "test[:60%]"], "test[60%:]")
SyntaxError: positional argument follows keyword argument
>>> train_set, test_set, valid_set = tfds.load("imdb_reviews:1.0.0", split=["train", "test[:60%]", "test[60%:]"])
>>> for review, label in train_set.take(2):
...     print(review.numpy().decode("utf-8"))
...     print(label.numpy())
2022-12-01 20:07:51.639683: W tensorflow/core/kernels/data/cache_dataset_ops.cc:856] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
AttributeError: 'str' object has no attribute 'numpy'

How to fix it?

You are trying to run python directly within powershell. But the powershell-interpreter speaks only powershell and cannot natively interprete python code.

You have to put the python code in a python file, e.g. my_code.py and call/execute it with python my_code.py from within powershell. Now the python interpreter is used to run the script. See How to run python code for details.

