I have a following table:
id | summary | summary_len | apple | book | computer |
---|---|---|---|---|---|
1 | .... | 210 | 2 | 1 | 0 |
2 | ... | 120 | 3 | 0 | 1 |
3 | ... | 50 | 2 | 2 | 1 |
summary is basically some sort of description, summary_len <- the length of those descriptions and the rest - apple/book/computer and the keywords and the values presented in the table - those are the occurrences in each description.
I need to normalize this table, in a way to find max value - PER COLUMN (vertically) and then divide by this value, so the output will be as below (I put it in a format 2/3 - just to emphasis max value per column):
id | summary | summary_len | apple | book | computer |
---|---|---|---|---|---|
1 | .... | 210 | 2/3 | 1/2 | 0/1 |
2 | ... | 120 | 3/3 | 0/2 | 1/1 |
3 | ... | 50 | 2/3 | 2/2 | 1/1 |
My problem here is that I don't have to find max in each columns - only for those keywords, which I am checking the occurrences for. I stored them in a list and got max value per column:
max_per_col = df_freq[keywords].max()
max_per_col
this is how it looks (with the original data):
Could you help me apply it "back" to the former dataframe and divide vertically each column by the max value?
CodePudding user response:
You can divide only filtered columns by maximal values:
keywords = ['apple','book','computer']
df_freq[keywords] /= df_freq[keywords].max()
#working like
#df_freq[keywords] = df_freq[keywords] / df_freq[keywords].max()
print (df_freq)
id summary_len apple book computer
0 1 210 0.666667 0.5 0.0
1 2 120 1.000000 0.0 1.0
2 3 50 0.666667 1.0 1.0