Home > Back-end >  Can I visualize the content of a datasets.Dataset?
Can I visualize the content of a datasets.Dataset?

Time:07-27

I am using the Huggingface datasets library to load a dataset from a pandas dataframe. The code is something similar to this:

from datasets import Dataset
import pandas as pd
df = pd.DataFrame({"a": [1], "b":[1]})
dataset = Dataset.from_pandas(df) 

Everything went smoothly, however, I wanted to double check the content of the loaded Dataset. I was looking for something similar to a df.head() like we have in Pandas, but I found nothing on the official Huggingface documentation. Is there a way to "read" even partially the content of the loaded dataset?

Doing a simple print(dataset) does not shows the content, but only some high level information:

Dataset({
    features: ['a', 'b'],
    num_rows: 1
})

CodePudding user response:

The answer is simpler than you think. Just do

print(dataset[i])

where i is the number of the row (first is 0).

The output will be a dictionary with the features as keys and the content of the row as values.

print(dataset[0])

<<< {
"a": [1],
"b": [1]
}
  • Related