Home > Software design >  pandas, combine two rows to one, only for specific values, with an updated value
pandas, combine two rows to one, only for specific values, with an updated value

Time:09-29

Current dataframe is as follows:

df = pd.read_csv('filename.csv', delimiter=',')

print(df)
idx   uniqueID      String 

0        1           'hello'
1        1           'goodbye'
2        1           'yo'
4        2           'hello'
5        2           'sup'
6        3           'howdy'
7        3           'hello'
8        3           'goodbye'

Expected Output:

idx   uniqueID      String 

0        1           'greeting'
2        1           'yo'
4        2           'hello'
5        2           'sup'
6        3           'howdy'
7        3           'greeting'

Effectively, if there is a consecutive pattern of 'hello' then 'goodbye' within the same uniqueID, they get merged to a single row and become a different string (which in this case I allocated to be 'greeting').

Question: How do I combine two rows to one, only for specific values, with an updated value I want?

I'm thinking use of .loc would be ideal here, but unsure how to go about it. Thank you!

CodePudding user response:

In your case do

idx = df.groupby('uniqueID')['String'].apply(lambda x : x.shift().eq('hello')&x.eq('goodbye'))
out = df.drop(idx.index[idx],axis=0)
out.loc[idx.index[idx]-1,'String'] = 'greeting'
out
Out[79]: 
   idx  uniqueID    String
0    0         1  greeting
2    2         1        yo
3    4         2     hello
4    5         2       sup
5    6         3     howdy
6    7         3  greeting

CodePudding user response:

here is one more way do it,

#using mask check if the consecutive values are hello and goodbye, and change the first value to greetings
df['String']=df['String'].mask(df['uniqueID'].eq(df['uniqueID'].shift(-1)) &
                               df['String'].str[1:-1].eq('hello') & 
                               df['String'].shift(-1).str[1:-1].eq('goodbye'),'greetings')
df


# identify the row above the greetings and drop it
df.drop(df.loc[df['String'].shift().eq('greetings') ].index)
    idx     uniqueID    String
0   0   1   greetings
2   2   1   'yo'
3   4   2   'hello'
4   5   2   'sup'
5   6   3   'howdy'
6   7   3   greetings
  • Related