I have a dataframe as below:
import pandas as pd
# intialise data of lists.
data = {'Name':['Tom', 'nick', 'krish', 'jack'],
'Book1':[20, 21, 19, 18],
'Book2':[20,'', 12, 20],
'Book3':[31, 21, 17, 16],
'Book4':[31, 19, 18, 16]}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
print(df)
Name Book1 Book2 Book3 Book4
Tom 20 20 31 31
nick 21 21 19
krish 19 12 17 18
jack 18 20 16 16
I wish to get below output which is comparing Book1, Book2, Book3 and Book4 column. For row Tom output, there are two 20 and two 31, since the number of the value is equal valent so in this case it will prefer the value come fist that is Book1, so the Output column is 20. For row nick, there was two number 21 and one number 19, so it will take the most occurrence number for output column which is number 21. While for krish row, there was no repeated number so the output column i want fix it as "Mix" .
Output column as below:
Name Book1 Book2 Book3 Book4 Output
Tom 20 20 31 31 20
nick 21 21 19 21
krish 19 12 17 18 Mix
jack 18 20 16 16 16
Anyone have ideas? I saw there is mode function but it was not applicable for this case, please help, thanks
CodePudding user response:
Use value_counts
:
max_val = lambda x: x.value_counts().index[0] \
if x.value_counts().iloc[0] > 1 else 'Mix'
df['Output'] = df.filter(like='Book').apply(max_val, axis=1)
print(df)
# Output:
Name Book1 Book2 Book3 Book4 Output
0 Tom 20 20 31 31 20
1 nick 21 21 19 21
2 krish 19 12 17 18 Mix
3 jack 18 20 16 16 16
Update
If you use Python >= 3.8
, you can use the walrus operator (avoid a double call to value_counts
:
max_val = lambda x: v.index[0] if (v := x.value_counts()).iloc[0] > 1 else 'Mix'
df['Output'] = df.filter(like='Book').apply(max_val, axis=1)
CodePudding user response:
We can use your idea on mode
to get your desired output. First, we need to convert the relevant columns to numeric data types:
temp = (df
.filter(like='Book')
.apply(pd.to_numeric)
.mode(1)
)
# compute for values
# nulls exist only if there are duplicates
output = np.where(temp.notna().all(1),
# value if True
'Mix',
# if False, pick the first modal value,
temp.iloc[:, 0])
df.assign(output = output)
Name Book1 Book2 Book3 Book4 output
0 Tom 20 20 31 31 20.0
1 nick 21 21 19 21.0
2 krish 19 12 17 18 Mix
3 jack 18 20 16 16 16.0