I have a dataframe
import pandas as pd
iris=pd.read_csv("https://gist.githubusercontent.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv")
iris.tail(5)
iris.head(5)
From iris
dataframe I derived df_setosa
,df_virginica
, and df_versicolor
dataframes
df_setosa = iris[iris['variety'] == 'Setosa']
df_virginica = iris[iris['variety'] == 'Virginica']
df_versicolor = iris[iris['variety'] == 'Versicolor']
# paste the corresponding variety name as the suffix to each dataframe
df_setosa = df_setosa.add_suffix('_setosa')
df_virginica = df_virginica.add_suffix('_virginica')
df_versicolor = df_versicolor.add_suffix('_versicolor')
print(df_virginica.columns)
print(df_versicolor.columns)
print(df_setosa.columns)
print(df_setosa.shape) # 50 row by 5 columns
print(df_versicolor.shape) # 50 rows by 5 columns
print(df_virginica.shape) # 50 rows by 5 columns
Since each dataframe has shape of (50,5)
, I want to concatenate (or as we say in R cbind) the three dataframes.
My attempt:
#### I need help concatenating the three dataframes
concat_df = pd.concat([df_setosa,df_virginica,df_versicolor]) # this returns a lot of NaN
concat_df.shape # this returns a shape of 150 rows by 15 columns instead of 50 rows by 15 columns
The concat_df
should have a 50 rows by 15 columns
shape
Thanks in advance
CodePudding user response:
When you create the "sub" dataframes, reset their indexes, since there's no reason to keep the index of the original iris set in this case
df_setosa = iris[iris['variety'] == 'Setosa'].reset_index(drop=True)
df_virginica = iris[iris['variety'] == 'Virginica'].reset_index(drop=True)
df_versicolor = iris[iris['variety'] == 'Versicolor'].reset_index(drop=True)
Then when you concat, make sure you concat horizontally by setting "axis" argument to 1, like so:
concat_df = pd.concat([df_setosa,df_virginica,df_versicolor], axis=1)
You can also leave the "reset_index" for this last step. If you don't do this the concat will still place 150 rows since it will try to put the indexes from 0 to 149 in order and fill the rest with NaNs