Home > Back-end >  How to copy a row base on condition and change one column base on same condition and add it back to
How to copy a row base on condition and change one column base on same condition and add it back to

Time:11-04

I have the below pandas dataframe.

d = {'id1': ['85643', '85644','8564312','8564314','85645','8564316','85646','8564318','85647','85648','85649','85655'],'ID': ['G-00001', 'G-00001','G-00002','G-00002','G-00001','G-00002','G-00001','G-00002','G-00001','G-00001','G-00001','G-00001'],'col1': [1, 2,3,4,5,60,0,0,6,3,2,4],'Goal': [np.nan, 56,np.nan,89,73,np.nan ,np.nan ,np.nan, np.nan, np.nan, 34,np.nan ], 'col2': [3, 4,32,43,55,610,0,0,16,23,72,48],'col3': [1, 22,33,44,55,60,1,5,6,3,2,4],'Name': ['a1asd', 'a2asd','aabsd','aabsd','a3asd','aabsd','aasd','aabsd','aasd','aasd','aasd','aasd'],'Date': ['2021-06-13', '2021-06-13','2021-06-13','2021-06-14','2021-06-15','2021-06-15','2021-06-13','2021-06-16','2021-06-13','2021-06-13','2021-06-13','2021-06-16']}
dff = pd.DataFrame(data=d)
dff
        id1       ID  col1  Goal  col2  col3   Name        Date
0     85643  G-00001     1   NaN     3     1  a1asd  2021-06-13
1     85644  G-00001     2  56.0     4    22  a2asd  2021-06-13
2   8564312  G-00002     3   NaN    32    33  aabsd  2021-06-13
3   8564314  G-00002     4  89.0    43    44  aabsd  2021-06-14
4     85645  G-00001     5  73.0    55    55  a3asd  2021-06-15
5   8564316  G-00002    60   NaN   610    60  aabsd  2021-06-15
6     85646  G-00001     0   NaN     0     1   aasd  2021-06-13
7   8564318  G-00002     0   NaN     0     5  aabsd  2021-06-16
8     85647  G-00001     6   NaN    16     6   aasd  2021-06-13
9     85648  G-00001     3   NaN    23     3   aasd  2021-06-13
10    85649  G-00001     2  34.0    72     2   aasd  2021-06-13
11    85655  G-00001     4   NaN    48     4   aasd  2021-06-16

Also, I have some "id1" column slices.

b65 = ['85643','85645', '85655','85646']
b66 = ['85643','85645','85647','85648','85649','85644']
b67 = ['8564312','8564314','8564316','8564318']

Based on the "id1" column slices, I want to change the "ID" column and then I want to add that row back to the same dataframe.

for example, if we consider b65 slicer.

b65 = ['85643','85645', '85655','85646']

I want something like below dataframe.

        id1       ID  col1  Goal  col2  col3   Name        Date
0     85643  G-00001     1   NaN     3     1  a1asd  2021-06-13
1     85644  G-00001     2  56.0     4    22  a2asd  2021-06-13
2   8564312  G-00002     3   NaN    32    33  aabsd  2021-06-13
3   8564314  G-00002     4  89.0    43    44  aabsd  2021-06-14
4     85645  G-00001     5  73.0    55    55  a3asd  2021-06-15
5   8564316  G-00002    60   NaN   610    60  aabsd  2021-06-15
6     85646  G-00001     0   NaN     0     1   aasd  2021-06-13
7   8564318  G-00002     0   NaN     0     5  aabsd  2021-06-16
8     85647  G-00001     6   NaN    16     6   aasd  2021-06-13
9     85648  G-00001     3   NaN    23     3   aasd  2021-06-13
10    85649  G-00001     2  34.0    72     2   aasd  2021-06-13
11    85655  G-00001     4   NaN    48     4   aasd  2021-06-16
12    85643     b-65     1   NaN     3     1  a1asd  2021-06-13
13    85645     b-65     5  73.0    55    55  a3asd  2021-06-15
14    85646     b-65     0   NaN     0     1   aasd  2021-06-13
15    85655     b-65     4   NaN    48     4   aasd  2021-06-16

I want to do the same for the rest of the slicers (b66, b67) and add them back to the same dataframe. Is it possible to do that? Any suggestion? Thanks in advance

CodePudding user response:

You can use a dictionary with the slices, a list comprehension and pandas.concat:

slices = {'b-65': ['85643','85645', '85655','85646'],
          'b-66': ['85643','85645','85647','85648','85649','85644'],
          'b-67': ['8564312','8564314','8564316','8564318'],
          }

pd.concat([dff] 
          [dff[dff['id1'].isin(v)].assign(ID=k) for k,v in slices.items()],
          ignore_index=True)

output:

        id1       ID  col1  Goal  col2  col3   Name        Date
0     85643  G-00001     1   NaN     3     1  a1asd  2021-06-13
1     85644  G-00001     2  56.0     4    22  a2asd  2021-06-13
2   8564312  G-00002     3   NaN    32    33  aabsd  2021-06-13
3   8564314  G-00002     4  89.0    43    44  aabsd  2021-06-14
4     85645  G-00001     5  73.0    55    55  a3asd  2021-06-15
5   8564316  G-00002    60   NaN   610    60  aabsd  2021-06-15
6     85646  G-00001     0   NaN     0     1   aasd  2021-06-13
7   8564318  G-00002     0   NaN     0     5  aabsd  2021-06-16
8     85647  G-00001     6   NaN    16     6   aasd  2021-06-13
9     85648  G-00001     3   NaN    23     3   aasd  2021-06-13
10    85649  G-00001     2  34.0    72     2   aasd  2021-06-13
11    85655  G-00001     4   NaN    48     4   aasd  2021-06-16
12    85643     b-65     1   NaN     3     1  a1asd  2021-06-13
13    85645     b-65     5  73.0    55    55  a3asd  2021-06-15
14    85646     b-65     0   NaN     0     1   aasd  2021-06-13
15    85655     b-65     4   NaN    48     4   aasd  2021-06-16
16    85643     b-66     1   NaN     3     1  a1asd  2021-06-13
17    85644     b-66     2  56.0     4    22  a2asd  2021-06-13
18    85645     b-66     5  73.0    55    55  a3asd  2021-06-15
19    85647     b-66     6   NaN    16     6   aasd  2021-06-13
20    85648     b-66     3   NaN    23     3   aasd  2021-06-13
21    85649     b-66     2  34.0    72     2   aasd  2021-06-13
22  8564312     b-67     3   NaN    32    33  aabsd  2021-06-13
23  8564314     b-67     4  89.0    43    44  aabsd  2021-06-14
24  8564316     b-67    60   NaN   610    60  aabsd  2021-06-15
25  8564318     b-67     0   NaN     0     5  aabsd  2021-06-16

CodePudding user response:

Based on the solution of @mozway I tried to use your lists instead of the dict = slices he created. The changes are pretty simple

`df2 = (pd.concat([dff] 
  [dff[dff['id1'].isin([val])].assign(ID='b-65') for val in b65] 
  [dff[dff['id1'].isin([val])].assign(ID='b-66') for val in b66]
  [dff[dff['id1'].isin([val])].assign(ID='b-67') for val in b67]))`

but you need to add three lists one by one. If I try to do it with a list comprehension I'm just not able to access the name of the current list in the loop.

`df2 = (pd.concat([dff]  [dff[dff['id1'].isin([val])]
.assign(ID="name_of_the_list") for sublst in lst for val in sublst]))`

Is there a way to assign the name of the list to the IDor do you really must create a dictionary with b65/b66/b67 as keys?

  • Related