Is there a python function which will separate numerical and categorical variables as separate colum-CodePudding

If a DataFrame Column has numerical and categorical variables I want to separate them into two different columns with numerical values in one column and categorical values in other column as given below.

import pandas as pd
df = pd.DataFrame(
    {"col":['a', '1', 'b', '2', '3', 'c', 'd' ,'e']})

I wrote a function which will do exactly as i want for a list but i am unable to apply the function for dataframe columns like how get_dummies does. "pd.get_dummies(df, prefix=['A', 'D'], columns=['A', 'D'])"

CodePudding user response：

Code

use regex (need change pattern if numeric values are not only integer)

df['col'].str.extract('(^\d $)?(. )?').fillna('')

out:

If you don't want to use regex, use boolean masking.

cond = df['col'].str.isdigit()
pd.DataFrame([df['col'].where(cond, ''), df['col'].where(~cond, '')]).T

same result

CodePudding user response：

numbers = df['col'].str.isnumeric()
df['numbers'] = df.loc[numbers, 'col']
df['categories'] = df.loc[~numbers, 'col']
print(df)

Output:

  col numbers categories
0   a     NaN          a
1   1       1        NaN
2   b     NaN          b
3   2       2        NaN
4   3       3        NaN
5   c     NaN          c
6   d     NaN          d
7   e     NaN          e

CodePudding user response：

Are you happy to have nan in your numerical column? In this case you can use
pd.to_numeric and np.where to achieve your goal.

import pandas as pd
import numpy as np
df = pd.DataFrame(
    {"col":['a', '1', 'b', '2', '3', 'c', 'd' ,'e']})

df[0] = pd.to_numeric(df['col'], errors='coerce')
df[1] = np.where(
    df[0].isnull(),
    df['col'],
    '')

and the output is

  col    0  1
0   a  NaN  a
1   1  1.0   
2   b  NaN  b
3   2  2.0   
4   3  3.0   
5   c  NaN  c
6   d  NaN  d
7   e  NaN  e

CodePudding user response：

Another solution, using .pivot:

print(
    df.assign(i=(~df["col"].str.isnumeric()).astype(int))
    .pivot(columns="i", values="col")
    .fillna("")
    .rename_axis(columns=None)
)

Prints: