I am new in python, I am fetching data as JSON and I transferred into dataframes. I want to update disks column value with another dataframe column value based on volume_id actually it is kind of replace value. Data' are examples not exact.
df1
instance_id disks
0 i-0c314bb8d450cvv95 [vol-08662a47184e0b3b3, vol-07a18ce1652ff88f3,...
1 i-064e75dvdb8fb8d75 [vol-0f2911a1d0b9a521e, vol-0c28d13ada32564da]
2 i-04a85cb1dccvd895b [vol-069bf62bf82f32402, vol-0424a5deb55024a47]
3 i-0572cvvdf430d7475 [vol-0e3b471dcbf0fbd47, vol-003d7443d8696a691]
4 i-0fe4cbbfsfgb51c14 [vol-0bcf2aa4f873cc296, vol-03d2a0f318973aaec]
df2
VolumeID instance_id size type
0 vol-02d8942df26543361 i-0533ae59fgt06cb7a 37 gp3
1 vol-0bcf2aa4f8754c296 i-0fe4cbb54df151c14 450 st1
2 vol-069bf84bf82f32402 i-04a85cfgd489d895b 50 gp3
3 vol-0424a5deb54524a47 i-04a85cb1fdg45895b 150 st1
4 vol-0871a4e6f387db693 i-0f2ae964gfdgf3caa 50 gp3
desired df
instance_id disks
0 i-0c314bb8d450cvv95 50 100 200
1 i-064e75dvdb8fb8d75 65 150
2 i-04a85cb1dccvd895b 50 150
3 i-0572cvvdf430d7475 100 150
4 i-0fe4cbbfsfgb51c14 80 150
CodePudding user response:
try this:
mapping_dict=dict(zip(df2["VolumeID"],df2["size"]))
df1["disks"]=df1["disks"].apply(lambda x:list(map(lambda value:mapping_dict[value],x)))
Btw, I don't think it's the best answer but it should work
CodePudding user response:
Here's a way to do what your question asks:
res = ( df1
.set_index('instance_id')
.explode('disks')
.join(df2.set_index('VolumeID'), on='disks')['size']
.groupby('instance_id')
.agg(disks=list)
.reset_index() )
Explanation:
- use
join()
to map eachVolumeID
to the volumes in the explodeddisks
column to a row indf2
, then turn the resultingsize
values into a list for each group usinggroupby()
oninstance_id
. - Note: This assumes
VolumeID
is a unique key. If instead the unique key were the compositeinstance_id, VolumeID
, a different approach would be required.
Sample input:
df1:
instance_id disks
0 i_00 [vol-000, vol-001, vol-002]
1 i_01 [vol-010, vol-011, vol-012]
2 i_02 [vol-020, vol-021, vol-022]
3 i_03 [vol-030, vol-031, vol-032]
4 i_04 [vol-040, vol-041, vol-042]
5 i_05 [vol-050, vol-051, vol-052]
6 i_06 [vol-060, vol-061, vol-062]
7 i_07 [vol-070, vol-071, vol-072]
8 i_08 [vol-080, vol-081, vol-082]
9 i_09 [vol-090, vol-091, vol-092]
df2:
VolumeID instance_id size type
0 vol-000 na 0 na
1 vol-001 na 5 na
2 vol-002 na 10 na
3 vol-010 na 15 na
4 vol-011 na 20 na
5 vol-012 na 25 na
6 vol-020 na 30 na
7 vol-021 na 35 na
8 vol-022 na 40 na
9 vol-030 na 45 na
10 vol-031 na 50 na
11 vol-032 na 55 na
12 vol-040 na 60 na
13 vol-041 na 65 na
14 vol-042 na 70 na
15 vol-050 na 75 na
16 vol-051 na 80 na
17 vol-052 na 85 na
18 vol-060 na 90 na
19 vol-061 na 95 na
20 vol-062 na 100 na
21 vol-070 na 105 na
22 vol-071 na 110 na
23 vol-072 na 115 na
24 vol-080 na 120 na
25 vol-081 na 125 na
26 vol-082 na 130 na
27 vol-090 na 135 na
28 vol-091 na 140 na
29 vol-092 na 145 na
Output:
instance_id disks
0 i_00 [0, 5, 10]
1 i_01 [15, 20, 25]
2 i_02 [30, 35, 40]
3 i_03 [45, 50, 55]
4 i_04 [60, 65, 70]
5 i_05 [75, 80, 85]
6 i_06 [90, 95, 100]
7 i_07 [105, 110, 115]
8 i_08 [120, 125, 130]
9 i_09 [135, 140, 145]