Home > OS >  Filling values based on column name
Filling values based on column name

Time:02-05

I have this simple data frame

import numpy as np
import pandas as pd

data = {'Name':['Karan','Rohit','Sahil','Aryan'],'Age':[23,22,21,23]}

df = pd.DataFrame(data)

I would like to create a new columns based on value of column age and insert 1 if column name fits with value in column Age

like this

    Name  Age    21    22    23
0  Karan   23  None  None  1
1  Rohit   22  None  1     None
2  Sahil   21  1     None  None
3  Aryan   23  None  None  1

I have tried

def data_categorical_check(df, column_cat):
    unique_val = np.unique(np.array(df.iloc[:, [column_cat]]))
    x = None

    for i in range(len(unique_val)):
        x = str(unique_val[i])
    
        df[x] = None
        df[x]=[ int(i == unique_val[i]) for i in df["age"]]  
    return df

This makes columns OK, but I am not able to correctly insert values. I am looking for general solution. I would like to define column to check in argument 'column cat'.

CodePudding user response:

Simple..Encode the values using get_dummies then mask the zeros and join back with original dataframe

s = pd.get_dummies(df['Age'])
df.join(s[s != 0])

    Name  Age   21   22   23
0  Karan   23  NaN  NaN  1.0
1  Rohit   22  NaN  1.0  NaN
2  Sahil   21  1.0  NaN  NaN
3  Aryan   23  NaN  NaN  1.0

CodePudding user response:

Use pd.crosstab:

>>> pd.concat([df, pd.crosstab(df.index, df.Age)], axis=1)

    Name  Age  21  22  23
0  Karan   23   0   0   1
1  Rohit   22   0   1   0
2  Sahil   21   1   0   0
3  Aryan   23   0   0   1

# OR

>>> pd.concat([df, pd.crosstab(df.index, df.Age).mask(lambda x: x==0)], axis=1)

    Name  Age   21   22   23
0  Karan   23  NaN  NaN  1.0
1  Rohit   22  NaN  1.0  NaN
2  Sahil   21  1.0  NaN  NaN
3  Aryan   23  NaN  NaN  1.0

CodePudding user response:

You can do it by creating a function thats return the row with the new column created:

def data_categorical_check(row):
  row[str(row["Age"])]=1
  return row

And applying it by using "apply" method:

df.apply(lambda x: data_categorical_check(x), axis=1)
  • Related