I have this simple data frame
import numpy as np
import pandas as pd
data = {'Name':['Karan','Rohit','Sahil','Aryan'],'Age':[23,22,21,23]}
df = pd.DataFrame(data)
I would like to create a new columns based on value of column age and insert 1 if column name fits with value in column Age
like this
Name Age 21 22 23
0 Karan 23 None None 1
1 Rohit 22 None 1 None
2 Sahil 21 1 None None
3 Aryan 23 None None 1
I have tried
def data_categorical_check(df, column_cat):
unique_val = np.unique(np.array(df.iloc[:, [column_cat]]))
x = None
for i in range(len(unique_val)):
x = str(unique_val[i])
df[x] = None
df[x]=[ int(i == unique_val[i]) for i in df["age"]]
return df
This makes columns OK, but I am not able to correctly insert values. I am looking for general solution. I would like to define column to check in argument 'column cat'.
CodePudding user response:
Simple..Encode the values using get_dummies
then mask the zeros and join
back with original dataframe
s = pd.get_dummies(df['Age'])
df.join(s[s != 0])
Name Age 21 22 23
0 Karan 23 NaN NaN 1.0
1 Rohit 22 NaN 1.0 NaN
2 Sahil 21 1.0 NaN NaN
3 Aryan 23 NaN NaN 1.0
CodePudding user response:
Use pd.crosstab
:
>>> pd.concat([df, pd.crosstab(df.index, df.Age)], axis=1)
Name Age 21 22 23
0 Karan 23 0 0 1
1 Rohit 22 0 1 0
2 Sahil 21 1 0 0
3 Aryan 23 0 0 1
# OR
>>> pd.concat([df, pd.crosstab(df.index, df.Age).mask(lambda x: x==0)], axis=1)
Name Age 21 22 23
0 Karan 23 NaN NaN 1.0
1 Rohit 22 NaN 1.0 NaN
2 Sahil 21 1.0 NaN NaN
3 Aryan 23 NaN NaN 1.0
CodePudding user response:
You can do it by creating a function thats return the row with the new column created:
def data_categorical_check(row):
row[str(row["Age"])]=1
return row
And applying it by using "apply" method:
df.apply(lambda x: data_categorical_check(x), axis=1)