I have a dataframe that discretize the customers into different Q's, which looks like:
CustomerID_num Q1 Q2 Q3 Q4 Q5 Country
0 12346 1 0 0 0 0 United Kingdom
2 12347 0 0 0 0 1 Iceland
9 12348 0 1 0 0 0 Finland
13 12349 0 0 0 0 1 Italy
14 12350 0 1 0 0 0 Norway
What I want to do is adding a new column, Q, to the dataframe which shows which sector this customer is in, so it looks like:
CustomerID_num Q1 Q2 Q3 Q4 Q5 Q Country
0 12346 1 0 0 0 0 1 United Kingdom
2 12347 0 0 0 0 1 5 Iceland
9 12348 0 1 0 0 0 2 Finland
13 12349 0 0 0 0 1 5 Italy
14 12350 0 1 0 0 0 2 Norway
The only way I can think about is using for loop but it will give me a mess. Any other way to do this?
CodePudding user response:
One option is to dump down into numpy:
Filter for just the Q
columns:
cols = df.filter(like = 'Q')
Get the column positions that are equal to 1:
_, positions = cols.to_numpy().nonzero()
df.assign(Q = positions 1)
CustomerID_num Q1 Q2 Q3 Q4 Q5 Country Q
0 12346 1 0 0 0 0 United Kingdom 1
2 12347 0 0 0 0 1 Iceland 5
9 12348 0 1 0 0 0 Finland 2
13 12349 0 0 0 0 1 Italy 5
14 12350 0 1 0 0 0 Norway 2
CodePudding user response:
df.loc[df["Q1"] == 1, "Q"] = 1
df.loc[df["Q2"] == 1, "Q"] = 2
df.loc[df["Q3"] == 1, "Q"] = 3
df.loc[df["Q4"] == 1, "Q"] = 4
df.loc[df["Q5"] == 1, "Q"] = 5
This is a possible solution using loc from pandas here is the documentation https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html
When the condition is true loc set the value for the entire column named "Q"