Left join table without duplicating right table row values-CodePudding

I have 2 dataframe's as below. I want to join right side table (cycletime) to left table (current data).

Left Table- Current data (df_current)

| datetime_index           | current | speed | cycle_counter |
|--------------------------|---------|-------|---------------|
| 27-10-2022 08:30:56.3056 | 30      | 60    | 1             |
| 27-10-2022 08:30:58.3058 | 30      | 60    | 1             |
| 27-10-2022 08:30:59.3059 | 31      | 62    | 1             |
| 27-10-2022 08:30:59.3059 | 30      | 60    | 1             |
| 27-10-2022 08:31:00.310  | 30.5    | 61    | 2             |
| 27-10-2022 08:31:01.311  | 30      | 60    | 2             |
| 27-10-2022 08:31:02.312  | 31      | 61    | 2             |
| 27-10-2022 08:31:02.312  | 30      | 60    | 3             |
| 27-10-2022 08:31:03.313  | 31      | 62    | 3             |
| 27-10-2022 08:31:04.314  | 30      | 60    | 3             |

Right Table- Cycletime data (df_cycletime)

| cycle_counter | total_time | up_time |
|---------------|------------|---------|
| 1             | 20         | 6       |
| 2             | 22         | 7       |
| 3             | 24         | 5       |

Code: I used the below code

df = df_current.reset_index().merge(df_cycletime, how='left', on=cyclecounter).set_index('datetime')

What I get

| datetime_index           | current | speed | cycle_counter | total_time | up_time |
|--------------------------|---------|-------|---------------|------------|---------|
| 27-10-2022 08:30:56.3056 | 30      | 60    | 1             | 20         | 6       |
| 27-10-2022 08:30:58.3058 | 30      | 60    | 1             | 20         | 6       |
| 27-10-2022 08:30:59.3059 | 31      | 62    | 1             | 20         | 6       |
| 27-10-2022 08:30:59.3059 | 30      | 60    | 1             | 20         | 6       |
| 27-10-2022 08:31:00.310  | 30.5    | 61    | 2             | 22         | 7       |
| 27-10-2022 08:31:01.311  | 30      | 60    | 2             | 22         | 7       |
| 27-10-2022 08:31:02.312  | 31      | 61    | 2             | 22         | 7       |
| 27-10-2022 08:31:02.312  | 30      | 60    | 3             | 24         | 5       |
| 27-10-2022 08:31:03.313  | 31      | 62    | 3             | 24         | 5       |
| 27-10-2022 08:31:04.314  | 30      | 60    | 3             | 24         | 5       |

Requirement: I don't want 'total_time' and 'up_time' to repeat, just only once for one cycle counter

| datetime_index           | current | speed | cycle_counter | total_time | up_time |
|--------------------------|---------|-------|---------------|------------|---------|
| 27-10-2022 08:30:56.3056 | 30      | 60    | 1             | 20         | 6       |
| 27-10-2022 08:30:58.3058 | 30      | 60    | 1             |            |         |
| 27-10-2022 08:30:59.3059 | 31      | 62    | 1             |            |         |
| 27-10-2022 08:30:59.3059 | 30      | 60    | 1             |            |         |
| 27-10-2022 08:31:00.310  | 30.5    | 61    | 2             | 22         | 7       |
| 27-10-2022 08:31:01.311  | 30      | 60    | 2             |            |         |
| 27-10-2022 08:31:02.312  | 31      | 61    | 2             |            |         |
| 27-10-2022 08:31:02.312  | 30      | 60    | 3             | 24         | 5       |
| 27-10-2022 08:31:03.313  | 31      | 62    | 3             |            |         |
| 27-10-2022 08:31:04.314  | 30      | 60    | 3             |            |         |

CodePudding user response：

You have to find duplicates in column total_time and column up_time according to cycle_counter column and replace them with empty string (""). this will work for all data.

df.loc[df.duplicated(['cycle_counter','total_time', 'up_time']), ['total_time','up_time']] = ""
print(df)

    cycle_counter   total_time  up_time
0   1               20          6
1   1       
2   1       
3   1       
4   2               22          7
5   2       
6   2       
7   3               24          5
8   3       
9   3