For the data frames like
ID Name Time
0 0 A 100
1 1 B 70
ID Name Time
0 0 C 40
1 1 D 90
I want to join them by rows and reset the ID
numbers. So the final data frame should be
ID Name Time
0 0 A 100
1 1 B 70
2 2 C 40
3 3 D 90
The code is
big_df = pd.DataFrame()
for i in range(1,3):
fname = 'test_' str(i) '.csv'
small_df = pd.read_csv(fname, skiprows=[1])
print(small_df)
frames = [big_df, small_df]
big_df = pd.concat(frames)
i = 1
big_df.set_index('ID', inplace=True)
print(big_df)
But the output is
Name Time
ID
0 A 100
1 B 70
0 C 40
1 D 90
I want to copy the index values to ID
column, but I know the set_index
will make the column as an index. How can I fix the code for that purpose?
UPDATE
I found that big_df['ID'] = big_df.index
will copy the index values to ID
column.
CodePudding user response:
One option using concat
an incrementation of the ID:
dfs = [df1, df2]
dic = dict(enumerate(map(len, dfs), start=1))
dic[0] = 0
out = (pd
.concat(dfs, keys=range(len(dfs)))
.assign(ID=lambda d: d['ID'].add(d.index.get_level_values(0).map(dic)))
.reset_index(drop=True)
)
Output:
ID Name Time
0 0 A 100
1 1 B 70
2 2 C 40
3 3 D 90
CodePudding user response:
have you tried using the ignore_index
option when concatenating?
pd.concat(frames, ignore_index=True)
CodePudding user response:
You can do this:
import pandas as pd
big_df = pd.DataFrame({'ID': [0, 1], 'Name': ['A', 'B'], 'Time': [100, 70]})
small_df = pd.DataFrame({'ID': [0, 1], 'Name': ['C', 'D'], 'Time': [40, 90]})
df = pd.concat([big_df, small_df])
df = df.reset_index(drop=True)
print(df)
CodePudding user response:
If you are trying to concatenating the dataframes then it can be done as such,
In [1]: df1 = pd.DataFrame(
...: {
...: "A": ["A0", "A1", "A2", "A3"],
...: "B": ["B0", "B1", "B2", "B3"],
...: "C": ["C0", "C1", "C2", "C3"],
...: "D": ["D0", "D1", "D2", "D3"],
...: },
...: index=[0, 1, 2, 3],
...: )
...:
In [2]: df2 = pd.DataFrame(
...: {
...: "A": ["A4", "A5", "A6", "A7"],
...: "B": ["B4", "B5", "B6", "B7"],
...: "C": ["C4", "C5", "C6", "C7"],
...: "D": ["D4", "D5", "D6", "D7"],
...: },
...: index=[4, 5, 6, 7],
...: )
...:
In [3]: df3 = pd.DataFrame(
...: {
...: "A": ["A8", "A9", "A10", "A11"],
...: "B": ["B8", "B9", "B10", "B11"],
...: "C": ["C8", "C9", "C10", "C11"],
...: "D": ["D8", "D9", "D10", "D11"],
...: },
...: index=[8, 9, 10, 11],
...: )
...:
In [4]: frames = [df1, df2, df3]
In [5]: result = pd.concat(frames, ignore_index=True)
for more documentation take a look at this.
https://pandas.pydata.org/docs/user_guide/merging.html
CodePudding user response:
Here is a proposition with pandas.concat
and pandas.DataFrame.index
:
big_df = pd.concat([df1, df2], ignore_index=True).assign(ID= lambda x: x.index)
# Output :
print(big_df)
ID Name Time
0 0 A 100
1 1 B 70
2 2 C 40
3 3 D 90