Join data frames and reset a column-CodePudding

For the data frames like

   ID Name  Time
0   0    A   100
1   1    B    70
   ID Name  Time
0   0    C    40
1   1    D    90

I want to join them by rows and reset the ID numbers. So the final data frame should be

   ID Name  Time
0   0    A   100
1   1    B    70
2   2    C    40
3   3    D    90

The code is

big_df = pd.DataFrame()
for i in range(1,3):
    fname = 'test_'   str(i)   '.csv'
    small_df = pd.read_csv(fname, skiprows=[1])
    print(small_df)
    frames = [big_df, small_df]
    big_df = pd.concat(frames) 
    i  = 1
big_df.set_index('ID', inplace=True)
print(big_df)

But the output is

   Name  Time
ID           
0     A   100
1     B    70
0     C    40
1     D    90

I want to copy the index values to ID column, but I know the set_index will make the column as an index. How can I fix the code for that purpose?

UPDATE

I found that big_df['ID'] = big_df.index will copy the index values to ID column.

CodePudding user response：

One option using concat an incrementation of the ID:

dfs = [df1, df2]

dic = dict(enumerate(map(len, dfs), start=1))
dic[0] = 0

out = (pd
  .concat(dfs, keys=range(len(dfs)))
  .assign(ID=lambda d: d['ID'].add(d.index.get_level_values(0).map(dic)))
  .reset_index(drop=True)
)

Output:

   ID Name  Time
0   0    A   100
1   1    B    70
2   2    C    40
3   3    D    90

CodePudding user response：

have you tried using the ignore_index option when concatenating?

pd.concat(frames, ignore_index=True)

CodePudding user response：

You can do this:

import pandas as pd

big_df = pd.DataFrame({'ID': [0, 1], 'Name': ['A', 'B'], 'Time': [100, 70]})
small_df = pd.DataFrame({'ID': [0, 1], 'Name': ['C', 'D'], 'Time': [40, 90]})

df = pd.concat([big_df, small_df])
df = df.reset_index(drop=True)

print(df)

CodePudding user response：

If you are trying to concatenating the dataframes then it can be done as such,

In [1]: df1 = pd.DataFrame(
   ...:     {
   ...:         "A": ["A0", "A1", "A2", "A3"],
   ...:         "B": ["B0", "B1", "B2", "B3"],
   ...:         "C": ["C0", "C1", "C2", "C3"],
   ...:         "D": ["D0", "D1", "D2", "D3"],
   ...:     },
   ...:     index=[0, 1, 2, 3],
   ...: )
   ...: 

In [2]: df2 = pd.DataFrame(
   ...:     {
   ...:         "A": ["A4", "A5", "A6", "A7"],
   ...:         "B": ["B4", "B5", "B6", "B7"],
   ...:         "C": ["C4", "C5", "C6", "C7"],
   ...:         "D": ["D4", "D5", "D6", "D7"],
   ...:     },
   ...:     index=[4, 5, 6, 7],
   ...: )
   ...: 

In [3]: df3 = pd.DataFrame(
   ...:     {
   ...:         "A": ["A8", "A9", "A10", "A11"],
   ...:         "B": ["B8", "B9", "B10", "B11"],
   ...:         "C": ["C8", "C9", "C10", "C11"],
   ...:         "D": ["D8", "D9", "D10", "D11"],
   ...:     },
   ...:     index=[8, 9, 10, 11],
   ...: )
   ...: 

In [4]: frames = [df1, df2, df3]

In [5]: result = pd.concat(frames, ignore_index=True)

for more documentation take a look at this.

https://pandas.pydata.org/docs/user_guide/merging.html

CodePudding user response：

Here is a proposition with pandas.concat and pandas.DataFrame.index :

big_df = pd.concat([df1, df2], ignore_index=True).assign(ID= lambda x: x.index)

# Output :

print(big_df)

   ID Name  Time
0   0    A   100
1   1    B    70
2   2    C    40
3   3    D    90