Home > Net >  How do I pop the last elements of lists nested in a dataframe?
How do I pop the last elements of lists nested in a dataframe?

Time:06-17

I am manipulating the following dataframe:

      CaseID                Activities
0          0        [10, 16, 577, 250]
1          1   [355, 10, 16, 577, 578]
2          2  [355, 578, 578, 16, 578]
3          3  [355, 12, 546, 438, 578]
4          4    [577, 180, 12, 79, 78]

my goal is to delete the last element of each list, so 250 for Case0, 578 for Case1 etc.

The following code works for a single row:

df.loc[0, 'Activities'].pop()

However, as soon as I try to iterate over the dataframe to pop the last list element in every row, I get a TypeError: unhashable type: 'Series'

for row in df.iterrows():
    df.loc[row, 'Activities'].pop()

CodePudding user response:

Try:

df["Activities"] = df["Activities"].str[:-1]
print(df)

Prints:

   CaseID           Activities
0       0        [10, 16, 577]
1       1   [355, 10, 16, 577]
2       2  [355, 578, 578, 16]
3       3  [355, 12, 546, 438]
4       4   [577, 180, 12, 79]

CodePudding user response:

Assuming you have lists, you do not need to iterate the whole DataFrame, just use the Series (column):

for l in df['Activities']:
    print(l.pop())
print(df)

output:

250
578
578
578
78
   CaseID           Activities
0       0        [10, 16, 577]
1       1   [355, 10, 16, 577]
2       2  [355, 578, 578, 16]
3       3  [355, 12, 546, 438]
4       4   [577, 180, 12, 79]

CodePudding user response:

This might not be the most efficient way but this preserves the original format. We first explode the lists into a single column keeping the original index, then we groupby the original index keeping every row but the last. Lastly we groupby the original index once more and aggregate the results back in to a list.

df = pd.DataFrame(
    {
        "CaseID": [0, 1, 2],
        "Activities": [
            [10, 16, 577, 250],
            [355, 10, 16, 577, 578],
            [355, 578, 578, 16, 578],
        ],
    }
)

df.assign(
    Activities=df.Activities.explode()
    .reset_index()
    .groupby("index", as_index=False)
    .apply(lambda s: s.head(len(s) - 1))
    .groupby("index")["Activities"]
    .apply(list)
)

   CaseID           Activities
0       0        [10, 16, 577]
1       1   [355, 10, 16, 577]
2       2  [355, 578, 578, 16]
  • Related