Edited:
I have a pandas dataframe as follows:
Class Sex SibSp Fare
0 0 0 0 0
2 2 2 2 2
3 3 3 3 3
5 5 5 5 5
I have another pandas dataframe as follows:
Class Sex SibSp Fare
1 1 1 1 1
4 4 4 4 4
If I concate these 2 dataframe using
pd.concat([traindf,testdf])
I get the following result:
Class Sex SibSp Fare
0 0 0 0 0
2 2 2 2 2
3 3 3 3 3
5 5 5 5 5
1 1 1 1 1
4 4 4 4 4
However, I want to get result as follows:
Class Sex SibSp Fare
0 0 0 0 0
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
I have used pd.concat([traindf,testdf]).sort_values()
but this does not work. Any idea on how to accomplish this so that dataframes are concatenated based on their index numbers. Thanks
CodePudding user response:
If need sorting by index use:
df = pd.concat([traindf,testdf]).sort_index()
Or if need sorting by column Class
use:
df = pd.concat([traindf,testdf]).sort_values(by=['Class'])
CodePudding user response:
If you want to copy all the columns then you can get the slice using loc
and just overwrite it.
# Create some dummy dataframes
df1 = pd.DataFrame(
{
'Pclass': np.random.randint(0,10,10),
'Fare': np.random.randint(0,10,10),
'Age': np.random.randint(0,100,10)
})
df2 = copy.deepcopy(df1[df1['Fare']%2 == 0]*1.5)
print (df1, df2)
# Owerwrite df1 with df2
for i in df2.index:
if i in df1.index:
df1.loc[i] = df2.loc[i]
print ("After overwrite")
print (df1)
Output:
Pclass Fare Age
0 7 6 25
1 8 3 34
2 0 4 57
3 9 1 98
4 3 5 58
5 8 0 97
6 9 6 53
7 2 0 1
8 0 5 33
9 2 9 36
Pclass Fare Age
0 10.5 9.0 37.5
2 0.0 6.0 85.5
5 12.0 0.0 145.5
6 13.5 9.0 79.5
7 3.0 0.0 1.5
After overwrite
Pclass Fare Age
0 10.5 9.0 37.5
1 8.0 3.0 34.0
2 0.0 6.0 85.5
3 9.0 1.0 98.0
4 3.0 5.0 58.0
5 12.0 0.0 145.5
6 13.5 9.0 79.5
7 3.0 0.0 1.5
8 0.0 5.0 33.0
9 2.0 9.0 36.0
CodePudding user response:
You could possibly fill in the ages without splitting the dataframes. But if you have to split them, then you can use the following:
pd.concat([traindf, testdf], sort=False).sort_index()