Home > database >  Using map and loc properly
Using map and loc properly

Time:06-10

Here's my code

df[df.loc[:,'medium']=='appsflyer']['source'] = df[df.loc[:,'medium']=='appsflyer'].['source'].map(lambda x: x if x in appsflyer else "Others")

Where the variable "appsflyer" is a list with all the values i want to keep. If it's not in the list, i want to mark the value as 'others'.

Since i only want to change the values in the appsflyer medium, i used the loc operator to slice my dateframe.

The code is running, there are no warnings or errors, but the values didn't change at all.

What's wrong here?

CodePudding user response:

By using chained indices in the assignment, you are creating a copy of the dataframe. See: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

df[df.loc[:, "medium"] == "appsflyer"]["source"].map(lambda x: x if x in appsflyer else "other")

This part produces the correct output, but since the result is assigned to a copy, the values in the original dataframe are not modified.

What you can do instead is to directly slice the dataframe so that only the rows that you want to change are selected. For example:

df = pd.DataFrame({
"medium": ["appsflyer", "appsflyer", "not_appsflyer", "not_appsflyer"], 
"source": [1, 2, 3, 4]})
appsflyer = [1, 4]
df
          medium  source
0      appsflyer       1
1      appsflyer       2
2  not_appsflyer       3
3  not_appsflyer       4

We need to select rows that a) have the value "appsflyer" in the medium column and b) whose value in the source column is not in the appsflyer list (so not 1 or 4).

The mask for these two conditions looks like this:

mask = (df.medium == "appsflyer") & ~(df.source.isin(appsflyer))

Now, we can simply use that as the row index in loc and avoid chaining multiple indices:

df.loc[mask, "source"] = "other"

df
          medium source
0      appsflyer      1
1      appsflyer  other
2  not_appsflyer      3
3  not_appsflyer      4

CodePudding user response:

IIUC, try with where:

  • for rows where the medium is "appsflyer" and the "source" is in the appsflyer list, keep the original source"
  • for the rest, change the source to "Others"
df["source"] = df["source"].where(df["medium"].eq("appsflyer")&df["source"].isin(appsflyer),"Others")
  • Related