Home > database >  Easy code adding additional information into the dataframe
Easy code adding additional information into the dataframe

Time:02-04

Is there another way to add the information to the datafame, an easier way?

I have a dataframe which looks like the following:

             Product  Price
2023-01-03     Apple   2.00
2023-01-04     Apple   2.10
2023-01-05     Apple   1.90
2023-01-03    Banana   1.10
2023-01-04    Banana   1.15
2023-01-05    Banana   1.30
2023-01-03  Cucumber   0.50
2023-01-04  Cucumber   0.80
2023-01-05  Cucumber   0.55

I have two additional information:

Apple       Fuit
Banana      Fruit
Cucumber    Vegetable

and

Apple       UK
Banana      Columbia
Cucumber    Mexico

this needs to be added in an additional column as shown below in the results.

            Product  Price  Category   Country
2023-01-03     Apple   2.00      Fuit        UK
2023-01-04     Apple   2.10      Fuit        UK
2023-01-05     Apple   1.90      Fuit        UK
2023-01-03    Banana   1.10      Fuit  Columbia
2023-01-04    Banana   1.15      Fuit  Columbia
2023-01-05    Banana   1.30      Fuit  Columbia
2023-01-03  Cucumber   0.50  Vegtable    Mexico
2023-01-04  Cucumber   0.80  Vegtable    Mexico
2023-01-05  Cucumber   0.55  Vegtable    Mexico

The code lokks like the following:

import pandas as pd

df = pd.DataFrame({
                   'Product': ['Apple', 'Apple', 'Apple',
                               'Banana', 'Banana', 'Banana',
                               'Cucumber', 'Cucumber', 'Cucumber'],
                   'Price': [2.0, 2.1, 1.9,
                             1.1, 1.15, 1.3,
                             0.5, 0.8, 0.55]},
    index=['2023-01-03', '2023-01-04', '2023-01-05',
            '2023-01-03', '2023-01-04', '2023-01-05',
            '2023-01-03', '2023-01-04', '2023-01-05'],
)

df.index = pd.to_datetime(df.index)
print(df)

def category(row):
    if row['Product'] == 'Apple':
        return 'Fuit'
    elif row['Product'] == 'Banana':
        return 'Fuit'
    else:
        return 'Vegtable'

def country(row):
    if row['Product'] == 'Apple':
        return 'UK'
    elif row['Product'] == 'Banana':
        return 'Columbia'
    else:
        return 'Mexico'

df['Category'] = df.apply(lambda row: category(row), axis=1)
df['Country'] = df.apply(lambda row: country(row), axis=1)

print(df)

CodePudding user response:

You can put the infos in a nested dict, then make a intermediate DataFrame for the merge :

infos = pd.DataFrame({"Category": {"Apple": "Fuit", "Banana": "Fruit", "Cucumber": "Vegetable"},
                      "Country": {"Apple": "UK", "Banana": "Columbia", "Cucumber": "Mexico"}})
​
​
df = df.merge(infos, left_on="Product", right_index=True, how="left")
​

Output :

print(df)
             Product  Price   Category   Country
2023-01-03     Apple   2.00       Fuit        UK
2023-01-04     Apple   2.10       Fuit        UK
2023-01-05     Apple   1.90       Fuit        UK
2023-01-03    Banana   1.10      Fruit  Columbia
2023-01-04    Banana   1.15      Fruit  Columbia
2023-01-05    Banana   1.30      Fruit  Columbia
2023-01-03  Cucumber   0.50  Vegetable    Mexico
2023-01-04  Cucumber   0.80  Vegetable    Mexico
2023-01-05  Cucumber   0.55  Vegetable    Mexico

CodePudding user response:

df['Category'] = df.apply(lambda row: 'Fruit' if (row['Product'] == 'Apple') or (row['Product'] == 'Banana') else 'Vegetable', axis=1)
df['Country'] = df.apply(lambda row: 'UK' if row['Product'] == 'Apple' else ('Columbia' if row['Product'] == 'Banana' else 'Mexico'), axis=1)
  • Related