I have the below pandas dataframe.
d = {'id1': ['85643', '85644','8564312','8564314','85645','8564316','85646','8564318','85647','85648','85649','85655'],'ID': ['G-00001', 'G-00001','G-00002','G-00002','G-00001','G-00002','G-00001','G-00002','G-00001','G-00001','G-00001','G-00001'],'col1': [1, 2,3,4,5,60,0,0,6,3,2,4],'Goal': [np.nan, 56,np.nan,89,73,np.nan ,np.nan ,np.nan, np.nan, np.nan, 34,np.nan ], 'col2': [3, 4,32,43,55,610,0,0,16,23,72,48],'col3': [1, 22,33,44,55,60,1,5,6,3,2,4],'Name': ['a1asd', 'a2asd','aabsd','aabsd','a3asd','aabsd','aasd','aabsd','aasd','aasd','aasd','aasd'],'Date': ['2021-06-13', '2021-06-13','2021-06-13','2021-06-14','2021-06-15','2021-06-15','2021-06-13','2021-06-16','2021-06-13','2021-06-13','2021-06-13','2021-06-16']}
dff = pd.DataFrame(data=d)
dff
id1 ID col1 Goal col2 col3 Name Date
0 85643 G-00001 1 NaN 3 1 a1asd 2021-06-13
1 85644 G-00001 2 56.0 4 22 a2asd 2021-06-13
2 8564312 G-00002 3 NaN 32 33 aabsd 2021-06-13
3 8564314 G-00002 4 89.0 43 44 aabsd 2021-06-14
4 85645 G-00001 5 73.0 55 55 a3asd 2021-06-15
5 8564316 G-00002 60 NaN 610 60 aabsd 2021-06-15
6 85646 G-00001 0 NaN 0 1 aasd 2021-06-13
7 8564318 G-00002 0 NaN 0 5 aabsd 2021-06-16
8 85647 G-00001 6 NaN 16 6 aasd 2021-06-13
9 85648 G-00001 3 NaN 23 3 aasd 2021-06-13
10 85649 G-00001 2 34.0 72 2 aasd 2021-06-13
11 85655 G-00001 4 NaN 48 4 aasd 2021-06-16
Also, I have some "id1" column slices.
b65 = ['85643','85645', '85655','85646']
b66 = ['85643','85645','85647','85648','85649','85644']
b67 = ['8564312','8564314','8564316','8564318']
Based on the "id1" column slices, I want to change the "ID" column and then I want to add that row back to the same dataframe.
for example, if we consider b65 slicer.
b65 = ['85643','85645', '85655','85646']
I want something like below dataframe.
id1 ID col1 Goal col2 col3 Name Date
0 85643 G-00001 1 NaN 3 1 a1asd 2021-06-13
1 85644 G-00001 2 56.0 4 22 a2asd 2021-06-13
2 8564312 G-00002 3 NaN 32 33 aabsd 2021-06-13
3 8564314 G-00002 4 89.0 43 44 aabsd 2021-06-14
4 85645 G-00001 5 73.0 55 55 a3asd 2021-06-15
5 8564316 G-00002 60 NaN 610 60 aabsd 2021-06-15
6 85646 G-00001 0 NaN 0 1 aasd 2021-06-13
7 8564318 G-00002 0 NaN 0 5 aabsd 2021-06-16
8 85647 G-00001 6 NaN 16 6 aasd 2021-06-13
9 85648 G-00001 3 NaN 23 3 aasd 2021-06-13
10 85649 G-00001 2 34.0 72 2 aasd 2021-06-13
11 85655 G-00001 4 NaN 48 4 aasd 2021-06-16
12 85643 b-65 1 NaN 3 1 a1asd 2021-06-13
13 85645 b-65 5 73.0 55 55 a3asd 2021-06-15
14 85646 b-65 0 NaN 0 1 aasd 2021-06-13
15 85655 b-65 4 NaN 48 4 aasd 2021-06-16
I want to do the same for the rest of the slicers (b66, b67) and add them back to the same dataframe. Is it possible to do that? Any suggestion? Thanks in advance
CodePudding user response:
You can use a dictionary with the slices, a list comprehension and pandas.concat
:
slices = {'b-65': ['85643','85645', '85655','85646'],
'b-66': ['85643','85645','85647','85648','85649','85644'],
'b-67': ['8564312','8564314','8564316','8564318'],
}
pd.concat([dff]
[dff[dff['id1'].isin(v)].assign(ID=k) for k,v in slices.items()],
ignore_index=True)
output:
id1 ID col1 Goal col2 col3 Name Date
0 85643 G-00001 1 NaN 3 1 a1asd 2021-06-13
1 85644 G-00001 2 56.0 4 22 a2asd 2021-06-13
2 8564312 G-00002 3 NaN 32 33 aabsd 2021-06-13
3 8564314 G-00002 4 89.0 43 44 aabsd 2021-06-14
4 85645 G-00001 5 73.0 55 55 a3asd 2021-06-15
5 8564316 G-00002 60 NaN 610 60 aabsd 2021-06-15
6 85646 G-00001 0 NaN 0 1 aasd 2021-06-13
7 8564318 G-00002 0 NaN 0 5 aabsd 2021-06-16
8 85647 G-00001 6 NaN 16 6 aasd 2021-06-13
9 85648 G-00001 3 NaN 23 3 aasd 2021-06-13
10 85649 G-00001 2 34.0 72 2 aasd 2021-06-13
11 85655 G-00001 4 NaN 48 4 aasd 2021-06-16
12 85643 b-65 1 NaN 3 1 a1asd 2021-06-13
13 85645 b-65 5 73.0 55 55 a3asd 2021-06-15
14 85646 b-65 0 NaN 0 1 aasd 2021-06-13
15 85655 b-65 4 NaN 48 4 aasd 2021-06-16
16 85643 b-66 1 NaN 3 1 a1asd 2021-06-13
17 85644 b-66 2 56.0 4 22 a2asd 2021-06-13
18 85645 b-66 5 73.0 55 55 a3asd 2021-06-15
19 85647 b-66 6 NaN 16 6 aasd 2021-06-13
20 85648 b-66 3 NaN 23 3 aasd 2021-06-13
21 85649 b-66 2 34.0 72 2 aasd 2021-06-13
22 8564312 b-67 3 NaN 32 33 aabsd 2021-06-13
23 8564314 b-67 4 89.0 43 44 aabsd 2021-06-14
24 8564316 b-67 60 NaN 610 60 aabsd 2021-06-15
25 8564318 b-67 0 NaN 0 5 aabsd 2021-06-16
CodePudding user response:
Based on the solution of @mozway I tried to use your lists instead of the dict = slices he created. The changes are pretty simple
`df2 = (pd.concat([dff]
[dff[dff['id1'].isin([val])].assign(ID='b-65') for val in b65]
[dff[dff['id1'].isin([val])].assign(ID='b-66') for val in b66]
[dff[dff['id1'].isin([val])].assign(ID='b-67') for val in b67]))`
but you need to add three lists one by one. If I try to do it with a list comprehension I'm just not able to access the name of the current list in the loop.
`df2 = (pd.concat([dff] [dff[dff['id1'].isin([val])]
.assign(ID="name_of_the_list") for sublst in lst for val in sublst]))`
Is there a way to assign the name of the list to the ID
or do you really must create a dictionary with b65/b66/b67 as keys?