I am trying to extract some values from a Pandas DataFrame, but when I extract them I have not only the content of the column, but also its description. How could I extract only the content, possibly as a string?
My code is as follows:
import os
import webbrowser
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the dataset
dataset_path = os.path.join(os.getcwd(), "../data/flavors_of_cacao.csv")
chocodata = pd.read_csv(dataset_path)
# Rename the columns to better manage them
new_colnames = ['company', 'bean_origin', 'ref', 'review_date', 'cocoa_percent',
'company_location', 'rating', 'bean_type', 'bean_origin']
chocodata = chocodata.rename(columns=dict(zip(chocodata.columns, new_colnames)))
# Get a list of all the unique values of the company column (that are overall 416)
companies = chocodata.company.unique()
# Get a list of all the unique values of the company_location column (that are overall 60)
countries = chocodata.company_location.unique()
# Create a dataframe in which to save the best company for each country
d = {'country': [],
'best_company': [],
'avg_rating': []}
loc_companies = pd.DataFrame(d)
# Create a dataframe in which to save the companies and their average evaluations
d = {'company': [],
'avg_rating': []}
avg_companies = pd.DataFrame(d)
# Set the company (for testing purposes)
c = 'U.S.A.'
companies_country = chocodata.loc[chocodata['company_location'] == c]
# Foreach company in country c, compute the average rating of its bars
for comp in companies_country.company:
company = companies_country[companies_country.company == comp]
mean_rating = company['rating'].mean()
d = pd.DataFrame(
{'company': [comp],
'avg_rating': [mean_rating]})
tmp = [avg_companies, d]
avg_companies = pd.concat(tmp)
# Remove the duplicate values
avg_companies.drop_duplicates(subset = 'company', keep = False, inplace = True)
# Extract the best company for country c
best_company = avg_companies.nlargest(1, ['avg_rating'])
# Insert the company in the dataframe
d = pd.DataFrame(
{'country': [c],
'best_company': [best_company['company']],
'avg_rating': [best_company['avg_rating']]})
tmp = [loc_companies, d]
loc_companies = pd.concat(tmp)
If I print the best_company['company']
variable, then I obtain the following:
0 Dole (Guittard) \n Name: company, dtype: object
However, I would like to keep only the Dole (Guittard) name as a string, instead of all this content. Could someone help me?
CodePudding user response:
You should use to_numpy()
for this instead of .values
as its use is discouraged:
We recommend using DataFrame.to_numpy() instead.
best_company['company'].to_numpy()
should give you what you're looking for.
CodePudding user response:
Does it work if you do, best_company['company'][0]
CodePudding user response:
Use Series.iloc
or Series.loc
:
best_company['company'].iloc[0]
OR
best_company['company'].loc[0]
CodePudding user response:
The answer was easier than I thought. It is sufficient to write
best_company['company'].values[0]
to extract the value as a string