Let's say I have this dataframe in pandas:
data = {'user':['Apple', 'One', 'Tom'],
'col_1':[1, 2, 3],
'col_2':[4, 5, 6],
'col_3':[7, 8, 9]
}
df = pd.DataFrame(data)
Which outputs:
user | col_1 | col_2 | col_3 |
---|---|---|---|
Apple | 1 | 2 | 3 |
One | 4 | 5 | 6 |
Tom | 7 | 8 | 9 |
Now, here is the deal: I want to modify the value of columns col_1
, col_2
and col_3
if the user is within a specific list of users. Otherwise, I have to keep the current value of the columns.
My proposal:
First, I created a dictionary, where the keys show the users that need modifications. Also, for each user, I created a list of the new values (col_1
, col_2
and col_3
, respectively) This way:
my_dict = {
'one' : ['col_1_value_one', 'col_2_value_one', 'col_3_value_one'],
'two' : ['col_1_value_two', 'col_2_value_two', 'col_3_value_two'],
'apple': ['col_1_value_apple', 'col_2_value_apple', 'col_3_value_apple']
}
Then, I created a list based on the dict keys:
list_users = list(my_dict.keys())
Finally, I was able to verify whether the user was in fact within the list of keys or not by using np.where
, contains
and some regex (that is why I joined the list with "|" and used lower()
). It works properly when keeping the "old" value if the user is not present in the list.
df['col_1'] = np.where(df['user'].str.lower().str.contains('|'.join(list_users), na=False, regex=True),
"<I need to replace this!>",
df['col_1'])
df['col_2'] = np.where(df['user'].str.lower().str.contains('|'.join(list_users), na=False, regex=True),
"<I need to replace this!>",
df['col_2'])
df['col_3'] = np.where(df['user'].str.lower().str.contains('|'.join(list_users), na=False, regex=True),
"<I need to replace this!>",
df['col_3'])
However, as you can see I couldn't find a way to replace the values for the ones present in the dictionary values if the user is in fact within the list. For example, if the user is "Apple", then the new values for col_1
, col_2
and col_3
are:
- col_1_value_apple
- col_2_value_apple
- col_3_value_apple
Expected output based on example:
user | col_1 | col_2 | col_3 |
---|---|---|---|
Apple | col_1_value_apple | col_2_value_apple | col_3_value_apple |
One | col_1_value_one | col_2_value_one | col_3_value_one |
Tom | 7 | 8 | 9 |
Here, both "Apple" and "One" users are modified, whilst "Tom" is not in the list, therefore keeping their original values.
I am sure this is not the optimal way to solve this, but any help would be highly appreciated!! Thanks!
CodePudding user response:
This works:
for name, cols in my_dict.items():
df.loc[df['user'].str.lower() == name, ['col_1', 'col_2', 'col_3']] = cols