Home > Software engineering >  Replace specific data frame rows with consecutive list values
Replace specific data frame rows with consecutive list values

Time:11-08

I can't find the solution to this apparently simple problem.

I have a large dataframe. Every time a 0 appears in the index column it means that the data comes from a different source. I would like to substitute the first '0' with the first element of my list until the next '0'. The second '0' with the second list element and so on. The number of 0s match the number of list elements.

d = {
    'Index': ['0', '1', '0','1', '0', '1', '2','3'],
    'user_id': ['id11', 'id54', 'id544','id659', 'id18', 'id549', 'id59','id59'],
    'launchingRole': ['writer', 'writer', 'writer','writer', 'reader', 'reader', 'reader','reader']
}
df2 = pd.DataFrame(d)

sessions_list = ['id599', '08749', '489w']

I've tried several solutions but none seems to work . Any help is greatly appreciated.

CodePudding user response:

This answer assumes that sessions_list is same length as the no of 0 in index column.

>>> import pandas as pd
>>>
>>>
>>> d = {
...     "Index": ["0", "1", "0", "1", "0", "1", "2", "3"],
...     "user_id": ["id11", "id54", "id544", "id659", "id18", "id549", "id59", "id59"],
...     "launchingRole": [
...         "writer",
...         "writer",
...         "writer",
...         "writer",
...         "reader",
...         "reader",
...         "reader",
...         "reader",
...     ],
... }
>>>
>>> df = pd.DataFrame(d)
>>> sessions_list = iter(["id599", "08749", "489w"])
>>> new_index = [index if index != "0" else next(sessions_list, "0") for index in df.Index]
>>> df.Index = new_index
>>> print(df)
   Index user_id launchingRole
0  id599    id11        writer
1      1    id54        writer
2  08749   id544        writer
3      1   id659        writer
4   489w    id18        reader
5      1   id549        reader
6      2    id59        reader
7      3    id59        reader

CodePudding user response:

We first check that session_list is the same length as the number of zero's.

So this does what you seek:

import pandas as pd

d = {
    'Index': ['0', '1', '0','1', '0', '1', '2','3'],
    'user_id': ['id11', 'id54', 'id544','id659', 'id18', 'id549', 'id59','id59'],
    'launchingRole': ['writer', 'writer', 'writer','writer', 'reader', 'reader', 'reader','reader']
}
df2 = pd.DataFrame(d)

sessions_list = ['id599', '08749', '489w']

# identify the zero's
df2['zero'] = df2['Index']=='0'

# check that list is same len as number of zero's
if len(sessions_list) == df2['zero'].sum(): pass
else: raise Exception("number zero's <> len list")

source = []
counter = -1
for i in df2['zero']:
    if i == True:
        counter  = 1
        source.append(sessions_list[counter])
    else:
        source.append(sessions_list[counter])

df2['source'] = source
print(df2)

The result is this

  Index user_id launchingRole   zero source
0     0    id11        writer   True  id599
1     1    id54        writer  False  id599
2     0   id544        writer   True  08749
3     1   id659        writer  False  08749
4     0    id18        reader   True   489w
5     1   id549        reader  False   489w
6     2    id59        reader  False   489w
7     3    id59        reader  False   489w
  • Related