I am using the Huggingface datasets
library to load a dataset from a pandas dataframe.
The code is something similar to this:
from datasets import Dataset
import pandas as pd
df = pd.DataFrame({"a": [1], "b":[1]})
dataset = Dataset.from_pandas(df)
Everything went smoothly, however, I wanted to double check the content of the loaded Dataset
. I was looking for something similar to a df.head()
like we have in Pandas, but I found nothing on the official Huggingface documentation. Is there a way to "read" even partially the content of the loaded dataset?
Doing a simple print(dataset)
does not shows the content, but only some high level information:
Dataset({
features: ['a', 'b'],
num_rows: 1
})
CodePudding user response:
The answer is simpler than you think. Just do
print(dataset[i])
where i
is the number of the row (first is 0).
The output will be a dictionary with the features as keys and the content of the row as values.
print(dataset[0])
<<< {
"a": [1],
"b": [1]
}