Home > Back-end >  creating a new column with lists in pandas df based another column
creating a new column with lists in pandas df based another column

Time:07-13

I have a pandas dataframe where some of the rows have multiple entries. I would like to match up a list I have to the third column. I have tried different things, but it isn't working for some reason.

Current df

username_list= ["charles23", "ems12", "", "sam34", "jon134", "", "jy19"]


ID1     ID2   passcode                    
01      01    Charlie233, Emily13         
01      02    
01      03    Sam310, John12               
01      04    
01      05    Jake42                      

Desired df

ID1     ID2   passcode                     username
01      01    Charlie233, Emily13          charles23, ems12
01      02                                
01      03    Sam310, John12               sam34, jon134
01      04                                
01      05    Jake42                       jy19

What I tried

df = df.assign(passcode = df["passcode"].str.split(",")).explode(column="passcode").assign(username=username_list).groupby(["ID1", "ID2"])["passcode", "username"].agg(list)

df.assign(
    passcode=df["passcode"].apply(lambda x: ", ".join(x) if x else ""),
    username=df["username"].apply(lambda x: ", ".join(x))
).reset_index()

ValueError: Length of values (1000) does not match length of index (1008)

I don't know why this error keeps happening given that I checked len(username_list) == len(df["passcode"])

CodePudding user response:

You can do:

df['pl'] = df['passcode'].str.split(',').str.len()
df['pi'] = df['pl'].cumsum() - df['pl']
df['username'] = df.apply(lambda x:username_list[x['pi']:x['pi']   x['pl']], 
                      axis=1).str.join(',')
df.drop(['pi', 'pl'], axis=1, inplace=True)

output (print(df)):

   ID1  ID2             passcode         username
0    1    1  Charlie233, Emily13  charles23,ems12
1    1    2                                      
2    1    3       Sam310, John12     sam34,jon134
3    1    4                                      
4    1    5               Jake42             jy19

Explanation:

Lets look at the username_list first, what we can see here is it is aligned with the values in passcode but we need to know where to start and where to stop for each list of words, for that the trick below works:

df['pl'] = df['passcode'].str.split(',').str.len()
df['pi'] = df['pl'].cumsum()-df['pl']

where pl is the passcode length and pi indicates where the next passcode starts at.

Then use these to go through your username_list to slice it and then join the list with ',' using pandas str.join method

df['username'] = df.apply(
        lambda x:username_list[x['pi']:x['pi']   x['pl']], axis=1).str.join(',')

Then drop the columns pi, pl

df.drop(['pi', 'pl'], axis=1, inplace=True)
  • Related