I have dataframe:
import pandas as pd
data = {'text': ['I ran home', 'I went home', 'I looked at the cat', 'The cat looked at me'],
'word_count':[3,3,4,5]}
df = pd.DataFrame(data)
df['len_text'] = df["text"].str.len()
text word_count len
0 I ran home 3 10
1 I went home 3 11
2 I looked at the cat 4 19
3 The cat looked at me 5 20
I want to filter rows with two conditions:
if the values in the word_count
column are the same and if the value in the len_text
column is greater than the next row, then leave the greater value.
So result will be:
text word_count len
0 I went home 3 11
1 I looked at the cat 4 19
2 The cat looked at me 5 20
I tried to do this but it doesn't work:
for i, row in df.iterrows():
if (df['pub_count'][i] == df['pub_count'][i 1])&(df['len'][i] >= df['df'][i 1]):
df = df.drop(i 1)
CodePudding user response:
You can create groups by consecutive values in word_count
and get indices by DataFrameGroupBy.idxmax
, last select only these rows by DataFrame.loc
:
g = df['word_count'].ne(df['word_count'].shift()).cumsum()
df = df.loc[df.groupby(g)['len_text'].idxmax()]
print (df)
text word_count len_text
1 I went home 3 11
2 I looked at the cat 4 19
3 The cat looked at me 5 20
Consecutive groups means if again e.g. group 3
are count separately:
data = {'text': ['I ran home', 'I went home', 'I looked at the cat',
'The cat looked at me','I ran home', 'I went homes'],
'word_count':[3,3,4,5,3,3]}
df = pd.DataFrame(data)
df['len_text'] = df["text"].str.len()
print (df)
text word_count len_text
0 I ran home 3 10
1 I went home 3 11
2 I looked at the cat 4 19
3 The cat looked at me 5 20
4 I ran home 3 10
5 I went homes 3 12
g = df['word_count'].ne(df['word_count'].shift()).cumsum()
df1 = df.loc[df.groupby(g)['len_text'].idxmax()]
print (df1)
text word_count len_text
1 I went home 3 11
2 I looked at the cat 4 19
3 The cat looked at me 5 20
5 I went homes 3 12
vs.
df2 = df.loc[df.groupby('word_count')['len_text'].idxmax()]
print (df2)
text word_count len_text
5 I went homes 3 12
2 I looked at the cat 4 19
3 The cat looked at me 5 20