Home > OS >  Concatenating pandas dataframe on basis of increasing index number and retaining their positon on ba
Concatenating pandas dataframe on basis of increasing index number and retaining their positon on ba

Time:10-02

Edited:

I have a pandas dataframe as follows:

  Class  Sex  SibSp Fare
0  0      0     0     0
2  2      2     2     2
3  3      3     3     3
5  5      5     5     5

I have another pandas dataframe as follows:

  Class  Sex  SibSp Fare
1  1      1     1     1
4  4      4     4     4

If I concate these 2 dataframe using

pd.concat([traindf,testdf])

I get the following result:

  Class  Sex  SibSp Fare
0  0      0     0     0
2  2      2     2     2
3  3      3     3     3
5  5      5     5     5
1  1      1     1     1
4  4      4     4     4 

However, I want to get result as follows:

   Class  Sex  SibSp Fare
0  0      0     0     0
1  1      1     1     1
2  2      2     2     2
3  3      3     3     3
4  4      4     4     4
5  5      5     5     5

I have used pd.concat([traindf,testdf]).sort_values() but this does not work. Any idea on how to accomplish this so that dataframes are concatenated based on their index numbers. Thanks

CodePudding user response:

If need sorting by index use:

df = pd.concat([traindf,testdf]).sort_index() 

Or if need sorting by column Class use:

df = pd.concat([traindf,testdf]).sort_values(by=['Class']) 

CodePudding user response:

If you want to copy all the columns then you can get the slice using loc and just overwrite it.

# Create some dummy dataframes
df1 = pd.DataFrame(
    {
        'Pclass': np.random.randint(0,10,10),
        'Fare': np.random.randint(0,10,10),
        'Age': np.random.randint(0,100,10)
    })
df2 = copy.deepcopy(df1[df1['Fare']%2 == 0]*1.5)
print (df1, df2)

# Owerwrite df1 with df2
for i in df2.index:
  if i in df1.index:
    df1.loc[i] = df2.loc[i]

print ("After overwrite")
print (df1)

Output:

   Pclass  Fare  Age
0       7     6   25
1       8     3   34
2       0     4   57
3       9     1   98
4       3     5   58
5       8     0   97
6       9     6   53
7       2     0    1
8       0     5   33
9       2     9   36

   Pclass  Fare    Age
0    10.5   9.0   37.5
2     0.0   6.0   85.5
5    12.0   0.0  145.5
6    13.5   9.0   79.5
7     3.0   0.0    1.5

After overwrite

   Pclass  Fare    Age
0    10.5   9.0   37.5
1     8.0   3.0   34.0
2     0.0   6.0   85.5
3     9.0   1.0   98.0
4     3.0   5.0   58.0
5    12.0   0.0  145.5
6    13.5   9.0   79.5
7     3.0   0.0    1.5
8     0.0   5.0   33.0
9     2.0   9.0   36.0

CodePudding user response:

You could possibly fill in the ages without splitting the dataframes. But if you have to split them, then you can use the following:

pd.concat([traindf, testdf], sort=False).sort_index()
  • Related