Input:
(Having error in uploading image, otherwise I always do.)
import pandas as pd
df = pd.DataFrame(
{
'keyword': ['app store', 'app marketing', 'app store', 'app marketing'],
'rank': [10, 12, 10, 12],
'impression': [100, 200, 100, 200],
'landing page': ['ngl.com/123', 'ngl.com/123', 'oats.com/123', 'oats.com/123']
}
)
df
Output:
import pandas as pd
df = pd.DataFrame(
{
'keyword': ['app', 'store', 'marketing', 'app', 'store', 'marketing'],
'mean_rank': [11, 10, 12, 11, 10, 12],
'impression': [300, 100, 200, 300, 100, 200],
'landing page': ['ngl.com/123', 'ngl.com/123', 'ngl.com/123', 'oats.com/123', 'oats.com/123', 'oats.com/123'],
'keyword_length':[3, 5, 9, 3, 5, 9],
'impression_per_char':[75, 16.67, 20, 75, 16.67, 20]
}
)
df
Maybe this could be used to find words in keyword:
words = 'app store'
re.findall('\w ', words.casefold())
mean_rank = Mean rank of the word in keyword.
keyword_length = length of the word in keyword.
impression_per_char = Impression/(keyword_length 1)
Actual dataset has 10,000 rows. This one is made by me, please tell if something is wrong with it. I'll be parallelly working on this for the next few hours.
Also, for 'mean_rank' column, you can take weighted mean or some made up equation that (maybe also) uses 'impression', 'keyword_length' and/or 'impression_per_char' to find a sensible rank. If you do so, then I'll select that as final answer instead.
CodePudding user response:
Use Series.str.casefold
with Series.str.split
and DataFrame.explode
for separate words, get legths of words by Series.str.len
, then aggregate sum
and mean
and last create impression_per_char
column:
df = df.assign(keyword = df['keyword'].str.casefold().str.split()).explode('keyword')
df['keyword_length'] = df['keyword'].str.len()
df = (df.groupby(['keyword','landing page', 'keyword_length' ], as_index=False, sort=False)
.agg(mean_rank=('rank','mean'), impression=('impression', 'sum')))
df['impression_per_char'] = df['impression'].div(df['keyword_length'].add(1))
print (df)
keyword landing page keyword_length mean_rank impression \
0 app ngl.com/123 3 11 300
1 store ngl.com/123 5 10 100
2 marketing ngl.com/123 9 12 200
3 app oats.com/123 3 11 300
4 store oats.com/123 5 10 100
5 marketing oats.com/123 9 12 200
impression_per_char
0 75.000000
1 16.666667
2 20.000000
3 75.000000
4 16.666667
5 20.000000