I have a situation where I want to separate categorical and numeric features from multiple data frames as mentioned below (df1,df2,df3, and df4) and I want to store these in two different data frames with names "Cont" and "Cat". I am looking for a process that loops into these multiple data frames and gives the output that I am looking for as explained below. This should purely work using the dtypes functionality of pandas to identify if a col is categorical or numeric
The input data frames look like: df1:
Name1 Number1
ABC 123
DEF 234
XXX 456
df2:
Name2 Number2
ABCD 1232
DEFG 2342
XXXY 4562
df3:
Name3 Number3
AB 12
DE 23
XX 45
df4:
Name4 Number4
A 1
D 3
X 5
The output should look like:
Cat:
Name1 Name2 Name3 Name4
ABC ABCD AB A
DEF DEFG DE D
XXX XXXY XX X
and similarly: Cont:
Number1 Number2 Number3 Number4
123 1232 12 1
234 2342 23 2
456 4562 45 5
How can this be achieved?
CodePudding user response:
You can use pandas.DataFrame.select_dtypes
to create the two dataframes.
Try this:
out = pd.concat([df1, df2, df3, df4], axis=1)
cat= out.select_dtypes(include="object") #or include="category"
cont= out.select_dtypes(include=np.number)
# Output :
print(cat)
Name1 Name2 Name3 Name4
0 ABC ABCD AB A
1 DEF DEFG DE D
2 XXX XXXY XX X
print(cont)
Number1 Number2 Number3 Number4
0 123 1232 12 1
1 234 2342 23 3
2 456 4562 45 5