If a DataFrame Column has numerical and categorical variables I want to separate them into two different columns with numerical values in one column and categorical values in other column as given below.
import pandas as pd
df = pd.DataFrame(
{"col":['a', '1', 'b', '2', '3', 'c', 'd' ,'e']})
I wrote a function which will do exactly as i want for a list but i am unable to apply the function for dataframe columns like how get_dummies does. "pd.get_dummies(df, prefix=['A', 'D'], columns=['A', 'D'])"
CodePudding user response:
Code
use regex (need change pattern if numeric values are not only integer)
df['col'].str.extract('(^\d $)?(. )?').fillna('')
out:
0 1
0 a
1 1
2 b
3 2
4 3
5 c
6 d
7 e
If you don't want to use regex, use boolean masking.
cond = df['col'].str.isdigit()
pd.DataFrame([df['col'].where(cond, ''), df['col'].where(~cond, '')]).T
same result
CodePudding user response:
numbers = df['col'].str.isnumeric()
df['numbers'] = df.loc[numbers, 'col']
df['categories'] = df.loc[~numbers, 'col']
print(df)
Output:
col numbers categories
0 a NaN a
1 1 1 NaN
2 b NaN b
3 2 2 NaN
4 3 3 NaN
5 c NaN c
6 d NaN d
7 e NaN e
CodePudding user response:
Are you happy to have nan
in your numerical column? In this case you can use
pd.to_numeric
and np.where
to achieve your goal.
import pandas as pd
import numpy as np
df = pd.DataFrame(
{"col":['a', '1', 'b', '2', '3', 'c', 'd' ,'e']})
df[0] = pd.to_numeric(df['col'], errors='coerce')
df[1] = np.where(
df[0].isnull(),
df['col'],
'')
and the output is
col 0 1
0 a NaN a
1 1 1.0
2 b NaN b
3 2 2.0
4 3 3.0
5 c NaN c
6 d NaN d
7 e NaN e
CodePudding user response:
Another solution, using .pivot
:
print(
df.assign(i=(~df["col"].str.isnumeric()).astype(int))
.pivot(columns="i", values="col")
.fillna("")
.rename_axis(columns=None)
)
Prints:
0 1
0 a
1 1
2 b
3 2
4 3
5 c
6 d
7 e