I'm very beginner with wandb , so this is very basic question. I have dataframe which has my x features and y values. I'm tryin to follow this tutorial to train model from my pandas dataframe . However, when I try to create wandb table from my pandas dataframe, I get an error:
wandb.init(project='my-xgb', config={'lr': 0.01})
#the log didn't work so I haven't run it at the moment (the log 'loss')
#wandb.log({'loss': loss, ...})
# Create a W&B Table with your pandas dataframe
table = wandb.Table(df1)
AssertionError: columns argument expects a
list
object
I have no idea why is this happen, and why it excpect a list. In the tutorial it doesn't look like the dataframe is list.
My end goal - to be able to create wandb table.
CodePudding user response:
Short answer: table = wandb.Table(dataframe=my_df)
.
The explanation of your specific case is at the bottom.
Minimal example of using wandb.Table
with a DataFrame:
import wandb
import pandas as pd
iris_path = 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv'
iris = pd.read_csv(iris_path)
table = wandb.Table(dataframe=iris)
wandb.log({'dataframe_in_table': table})
(Here the dataset is called the Iris dataset that consists of "3 different types of irises’ (Setosa, Versicolour, and Virginica) petal and sepal length, stored in a 150x4 numpy.ndarray")
There are two ways of creating W&B Table
s according to the official documentation:
- List of Rows: Log named columns and rows of data. For example:
wandb.Table(columns=["a", "b", "c"], data=[["1a", "1b", "1c"], ["2a", "2b", "2c"]])
generates a table with two rows and three columns. - Pandas DataFrame: Log a DataFrame using
wandb.Table(dataframe=my_df)
. Column names will be extracted from the DataFrame.
Explanation: Why table = wandb.Table(my_df)
gives error "columns argument expects a list
object"? Because wandb.Table
's init function looks like this:
def __init__(
self,
columns=None,
data=None,
rows=None,
dataframe=None,
dtype=None,
optional=True,
allow_mixed_types=False,
):
If one passes a DataFrame without telling it's a DataFrame, wandb.Table
will assume the argument is columns
.