Home > Enterprise >  How to cbind (concat) 3 dataframes in Python/pandas
How to cbind (concat) 3 dataframes in Python/pandas

Time:10-29

I have a dataframe

import pandas as pd

iris=pd.read_csv("https://gist.githubusercontent.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv")
iris.tail(5)
iris.head(5)

From iris dataframe I derived df_setosa,df_virginica, and df_versicolor dataframes

df_setosa = iris[iris['variety'] == 'Setosa']
df_virginica = iris[iris['variety'] == 'Virginica']
df_versicolor = iris[iris['variety'] == 'Versicolor']

# paste the corresponding variety name as the suffix to each dataframe 
df_setosa = df_setosa.add_suffix('_setosa')
df_virginica = df_virginica.add_suffix('_virginica')
df_versicolor = df_versicolor.add_suffix('_versicolor')

print(df_virginica.columns)
print(df_versicolor.columns)
print(df_setosa.columns)

print(df_setosa.shape) #  50 row by 5 columns
print(df_versicolor.shape) # 50 rows by 5 columns
print(df_virginica.shape) # 50 rows by 5 columns

Since each dataframe has shape of (50,5), I want to concatenate (or as we say in R cbind) the three dataframes.

My attempt:

#### I need help concatenating the three dataframes
concat_df  = pd.concat([df_setosa,df_virginica,df_versicolor]) # this returns a lot of NaN
concat_df.shape # this returns a shape of 150 rows by 15 columns  instead of 50 rows by 15 columns

The concat_df should have a 50 rows by 15 columns shape

Thanks in advance

CodePudding user response:

When you create the "sub" dataframes, reset their indexes, since there's no reason to keep the index of the original iris set in this case

df_setosa = iris[iris['variety'] == 'Setosa'].reset_index(drop=True)
df_virginica = iris[iris['variety'] == 'Virginica'].reset_index(drop=True)
df_versicolor = iris[iris['variety'] == 'Versicolor'].reset_index(drop=True)

Then when you concat, make sure you concat horizontally by setting "axis" argument to 1, like so:

concat_df  = pd.concat([df_setosa,df_virginica,df_versicolor], axis=1)

You can also leave the "reset_index" for this last step. If you don't do this the concat will still place 150 rows since it will try to put the indexes from 0 to 149 in order and fill the rest with NaNs

  • Related