I can't find the solution to this apparently simple problem.
I have a large dataframe. Every time a 0 appears in the index column it means that the data comes from a different source. I would like to substitute the first '0' with the first element of my list until the next '0'. The second '0' with the second list element and so on. The number of 0s match the number of list elements.
d = {
'Index': ['0', '1', '0','1', '0', '1', '2','3'],
'user_id': ['id11', 'id54', 'id544','id659', 'id18', 'id549', 'id59','id59'],
'launchingRole': ['writer', 'writer', 'writer','writer', 'reader', 'reader', 'reader','reader']
}
df2 = pd.DataFrame(d)
sessions_list = ['id599', '08749', '489w']
I've tried several solutions but none seems to work . Any help is greatly appreciated.
CodePudding user response:
This answer assumes that sessions_list
is same length as the no of 0
in index
column.
>>> import pandas as pd
>>>
>>>
>>> d = {
... "Index": ["0", "1", "0", "1", "0", "1", "2", "3"],
... "user_id": ["id11", "id54", "id544", "id659", "id18", "id549", "id59", "id59"],
... "launchingRole": [
... "writer",
... "writer",
... "writer",
... "writer",
... "reader",
... "reader",
... "reader",
... "reader",
... ],
... }
>>>
>>> df = pd.DataFrame(d)
>>> sessions_list = iter(["id599", "08749", "489w"])
>>> new_index = [index if index != "0" else next(sessions_list, "0") for index in df.Index]
>>> df.Index = new_index
>>> print(df)
Index user_id launchingRole
0 id599 id11 writer
1 1 id54 writer
2 08749 id544 writer
3 1 id659 writer
4 489w id18 reader
5 1 id549 reader
6 2 id59 reader
7 3 id59 reader
CodePudding user response:
We first check that session_list
is the same length as the number of zero's.
So this does what you seek:
import pandas as pd
d = {
'Index': ['0', '1', '0','1', '0', '1', '2','3'],
'user_id': ['id11', 'id54', 'id544','id659', 'id18', 'id549', 'id59','id59'],
'launchingRole': ['writer', 'writer', 'writer','writer', 'reader', 'reader', 'reader','reader']
}
df2 = pd.DataFrame(d)
sessions_list = ['id599', '08749', '489w']
# identify the zero's
df2['zero'] = df2['Index']=='0'
# check that list is same len as number of zero's
if len(sessions_list) == df2['zero'].sum(): pass
else: raise Exception("number zero's <> len list")
source = []
counter = -1
for i in df2['zero']:
if i == True:
counter = 1
source.append(sessions_list[counter])
else:
source.append(sessions_list[counter])
df2['source'] = source
print(df2)
The result is this
Index user_id launchingRole zero source
0 0 id11 writer True id599
1 1 id54 writer False id599
2 0 id544 writer True 08749
3 1 id659 writer False 08749
4 0 id18 reader True 489w
5 1 id549 reader False 489w
6 2 id59 reader False 489w
7 3 id59 reader False 489w