Python merge files into one and regroup rows based on previous labels in their own file-CodePudding

I have multiple files that need to be merged into one big file, but their group IDs should be modified based on the new generated file. See the following example:

dataframe 1: df1.txt

name group_id
A    1
B    1
C    1
D    2

dataframe 2: df2.txt

name group_id
E    1
F    1
G    2
H    3

dataframe 3: df3.txt

name group_id
I    1
J    2 
K    2
L    3

The final output is expected to be:

name group_id
A    1
B    1
C    1
D    2   
E    3
F    3
G    4
H    5
I    6
J    7 
K    7
L    8

The group_id column should be re-labeled in the final data frame but still based on their own group_ids in previous data frame.

For the code, I've tried to merge them into one but can't figure out how to label the group id part.

df = pd.concat(map(pd.read_csv, glob.glob('df*.txt')))

Thanks in advance!

Test data code if it can save you some time:

df1 = pd.DataFrame([['A', 1], ['B', 1], ['C', 1], ['D', 2]], columns=('name','group_id'))
df2 = pd.DataFrame([['E', 1], ['F', 1], ['G', 2], ['H', 3]], columns=('name','group_id'))
df3 = pd.DataFrame([['I', 1], ['J', 2], ['K', 2], ['L', 3]], columns=('name','group_id'))

CodePudding user response：

We have to reassign the group numbers after concatenation, this can be done using groupby ngroup which basically assigns the group number starting from 0 to number of groups - 1

df = pd.concat(map(pd.read_csv, glob.glob('df*.txt')))
df['group_id'] = df.groupby([(df.index == 0).cumsum(), 'group_id']).ngroup().add(1)

Result

print(df)

  name  group_id
0    A         1
1    B         1
2    C         1
3    D         2
0    E         3
1    F         3
2    G         4
3    H         5
0    I         6
1    J         7
2    K         7
3    L         8