Home > Software design >  Keeping only the value of the column Pandas DataFrame
Keeping only the value of the column Pandas DataFrame

Time:02-23

I am trying to extract some values from a Pandas DataFrame, but when I extract them I have not only the content of the column, but also its description. How could I extract only the content, possibly as a string?

My code is as follows:

import os
import webbrowser
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
dataset_path = os.path.join(os.getcwd(), "../data/flavors_of_cacao.csv")
chocodata = pd.read_csv(dataset_path)

# Rename the columns to better manage them
new_colnames = ['company', 'bean_origin', 'ref', 'review_date', 'cocoa_percent',
                'company_location', 'rating', 'bean_type', 'bean_origin']
chocodata = chocodata.rename(columns=dict(zip(chocodata.columns, new_colnames)))

# Get a list of all the unique values of the company column (that are overall 416)
companies = chocodata.company.unique()

# Get a list of all the unique values of the company_location column (that are overall 60)
countries = chocodata.company_location.unique()

# Create a dataframe in which to save the best company for each country
d = {'country': [],
     'best_company': [],
     'avg_rating': []}
loc_companies = pd.DataFrame(d)

# Create a dataframe in which to save the companies and their average evaluations
d = {'company': [],
     'avg_rating': []}
avg_companies = pd.DataFrame(d)

# Set the company (for testing purposes)
c = 'U.S.A.'
companies_country = chocodata.loc[chocodata['company_location'] == c]

# Foreach company in country c, compute the average rating of its bars
for comp in companies_country.company:
    company = companies_country[companies_country.company == comp]
    mean_rating = company['rating'].mean()
    d = pd.DataFrame(
        {'company': [comp],
         'avg_rating': [mean_rating]})
    tmp = [avg_companies, d]
    avg_companies = pd.concat(tmp)

# Remove the duplicate values
avg_companies.drop_duplicates(subset = 'company', keep = False, inplace = True)

# Extract the best company for country c
best_company = avg_companies.nlargest(1, ['avg_rating'])

# Insert the company in the dataframe
d = pd.DataFrame(
        {'country': [c],
         'best_company': [best_company['company']],
         'avg_rating': [best_company['avg_rating']]})
tmp = [loc_companies, d]
loc_companies = pd.concat(tmp)

If I print the best_company['company'] variable, then I obtain the following:

0 Dole (Guittard) \n Name: company, dtype: object

However, I would like to keep only the Dole (Guittard) name as a string, instead of all this content. Could someone help me?

CodePudding user response:

You should use to_numpy() for this instead of .values as its use is discouraged:

We recommend using DataFrame.to_numpy() instead.

best_company['company'].to_numpy() should give you what you're looking for.

CodePudding user response:

Does it work if you do, best_company['company'][0]

CodePudding user response:

Use Series.iloc or Series.loc:

best_company['company'].iloc[0]

OR

best_company['company'].loc[0]

CodePudding user response:

The answer was easier than I thought. It is sufficient to write

best_company['company'].values[0]

to extract the value as a string

  • Related