I loaded a csv file directly from the jupyter notebook with the following code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
import seaborn as sns
sns.set()
from sklearn.preprocessing import LabelEncoder
sales_data = pd.read_csv('Buymore_sales_data.csv')
sales_data.head(13801)
I had an output with some of the data not shown even though the output gave me the length of the data as 13799 rows × 7 columns.
Link to the output screenshot - salesdataset
However, I want to calculate average sales per market since each market appears more than two times. To achieve this I wrote this code:
sales_data.Kumasi.Sales.mean()
After running this, I'm having an error message as
-------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [48], in <cell line: 2>()
1 # compute average sales per market
----> 2 sales_data.Kumasi.Sales.mean()
File ~\anaconda3\lib\site-packages\pandas\core\generic.py:5575, in
NDFrame.__getattr__(self, name)
5568 if (
5569 name not in self._internal_names_set
5570 and name not in self._metadata
5571 and name not in self._accessors
5572 and self._info_axis._can_hold_identifiers_and_holds_name(name)
5573 ):
5574 return self[name]
-> 5575 return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'Kumasi'
Please what can I do?
- First of all, I want to see the entire dataset loaded in Jupyter Notebook
- Then I want to be able to calculate the average sales per market if possible
CodePudding user response:
For seeing the entire table,
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
# Instead of using None, you can specify what actually number you want it to.
For calculation,
sales_data[sales_data['Market'] == 'Kumasi']['Sales'].mean()
# Or
sales_data.query('Market == "Kumasi"')['Sales'].mean()