I have 2 dataframe's as below. I want to join right side table (cycletime) to left table (current data).
Left Table- Current data (df_current)
| datetime_index | current | speed | cycle_counter |
|--------------------------|---------|-------|---------------|
| 27-10-2022 08:30:56.3056 | 30 | 60 | 1 |
| 27-10-2022 08:30:58.3058 | 30 | 60 | 1 |
| 27-10-2022 08:30:59.3059 | 31 | 62 | 1 |
| 27-10-2022 08:30:59.3059 | 30 | 60 | 1 |
| 27-10-2022 08:31:00.310 | 30.5 | 61 | 2 |
| 27-10-2022 08:31:01.311 | 30 | 60 | 2 |
| 27-10-2022 08:31:02.312 | 31 | 61 | 2 |
| 27-10-2022 08:31:02.312 | 30 | 60 | 3 |
| 27-10-2022 08:31:03.313 | 31 | 62 | 3 |
| 27-10-2022 08:31:04.314 | 30 | 60 | 3 |
Right Table- Cycletime data (df_cycletime)
| cycle_counter | total_time | up_time |
|---------------|------------|---------|
| 1 | 20 | 6 |
| 2 | 22 | 7 |
| 3 | 24 | 5 |
Code: I used the below code
df = df_current.reset_index().merge(df_cycletime, how='left', on=cyclecounter).set_index('datetime')
What I get
| datetime_index | current | speed | cycle_counter | total_time | up_time |
|--------------------------|---------|-------|---------------|------------|---------|
| 27-10-2022 08:30:56.3056 | 30 | 60 | 1 | 20 | 6 |
| 27-10-2022 08:30:58.3058 | 30 | 60 | 1 | 20 | 6 |
| 27-10-2022 08:30:59.3059 | 31 | 62 | 1 | 20 | 6 |
| 27-10-2022 08:30:59.3059 | 30 | 60 | 1 | 20 | 6 |
| 27-10-2022 08:31:00.310 | 30.5 | 61 | 2 | 22 | 7 |
| 27-10-2022 08:31:01.311 | 30 | 60 | 2 | 22 | 7 |
| 27-10-2022 08:31:02.312 | 31 | 61 | 2 | 22 | 7 |
| 27-10-2022 08:31:02.312 | 30 | 60 | 3 | 24 | 5 |
| 27-10-2022 08:31:03.313 | 31 | 62 | 3 | 24 | 5 |
| 27-10-2022 08:31:04.314 | 30 | 60 | 3 | 24 | 5 |
Requirement: I don't want 'total_time' and 'up_time' to repeat, just only once for one cycle counter
| datetime_index | current | speed | cycle_counter | total_time | up_time |
|--------------------------|---------|-------|---------------|------------|---------|
| 27-10-2022 08:30:56.3056 | 30 | 60 | 1 | 20 | 6 |
| 27-10-2022 08:30:58.3058 | 30 | 60 | 1 | | |
| 27-10-2022 08:30:59.3059 | 31 | 62 | 1 | | |
| 27-10-2022 08:30:59.3059 | 30 | 60 | 1 | | |
| 27-10-2022 08:31:00.310 | 30.5 | 61 | 2 | 22 | 7 |
| 27-10-2022 08:31:01.311 | 30 | 60 | 2 | | |
| 27-10-2022 08:31:02.312 | 31 | 61 | 2 | | |
| 27-10-2022 08:31:02.312 | 30 | 60 | 3 | 24 | 5 |
| 27-10-2022 08:31:03.313 | 31 | 62 | 3 | | |
| 27-10-2022 08:31:04.314 | 30 | 60 | 3 | | |
CodePudding user response:
You have to find duplicates in column total_time and column up_time according to cycle_counter column and replace them with empty string (""). this will work for all data.
df.loc[df.duplicated(['cycle_counter','total_time', 'up_time']), ['total_time','up_time']] = ""
print(df)
cycle_counter total_time up_time
0 1 20 6
1 1
2 1
3 1
4 2 22 7
5 2
6 2
7 3 24 5
8 3
9 3