Home > Net >  pandas dataframe data sorting
pandas dataframe data sorting

Time:12-06

I have a dataset which looks like following-

|      year     |     state   |district| Party|
|---------------|-------------|--------|------|
|          2010 |     haryana |kaithal | INC|
|          2010 |     haryana |kaithal | bjp|
|          2010 |     haryana |kaithal |NOTA|
|          2010 |     goa     |panji   | AAP|
|          2010 |     goa     |panji   | INC|
|          2010 |     goa     |panji   | BJP|
|          2013 |     up      |meerut  | INC|
|          2013 |     up      |meerut  | SP |
|          2015 |     haryana |kaithal |INC |
|          2015 |     haryana |kaithal |BJP |
|          2015 |     haryana |kaithal |AAP |

I want to rename INC to major for year 2010 and BJP to major in 2015. I need data in the following manner-

year state district Party
2010 haryana kaithal major
2010 haryana kaithal bjp
2010 haryana kaithal NOTA
2010 goa panji AAP
2010 goa panji major
2010 goa panji BJP
2013 up meerut major
2013 up meerut SP
2015 haryana kaithal INC
2015 haryana kaithal major
2015 haryana kaithal AAP

I am using the code-

for state in df['state']:
  if state=='haryana':
    for year in df['year']:
      if year==2010:
        df['party'].replace('INC','major',inplace=True)
      else:
         continue
      if year==2015:
        df['party'].replace('BJP','major',inplace=True)
  else:
    continue

But this code is taking a lot of time to run and not giving the desired results as it is just considering replacing INC to major in all years and does not replace BJP.

CodePudding user response:

You can use boolean indexing with pandas.DataFrame.loc :

m1= (df["year"].eq(2010)) & (df["Party"].eq("INC"))
m2= (df["year"].eq(2015)) & (df["Party"].eq("BJP"))
​
df.loc[m1|m2, "Party"] = "major"

# Ouptut :

  print(df.to_string())
    year    state district  Party
0   2010  haryana  kaithal  major
1   2010  haryana  kaithal    bjp
2   2010  haryana  kaithal   NOTA
3   2010      goa    panji    AAP
4   2010      goa    panji  major
5   2010      goa    panji    BJP
6   2013       up   meerut    INC
7   2013       up   meerut     SP
8   2015  haryana  kaithal    INC
9   2015  haryana  kaithal  major
10  2015  haryana  kaithal    AAP

CodePudding user response:

Chain 3 conditions for compare each 3 values and set new values in DataFrame.loc:

m1 = (df.state=='haryana') & (df['year'] == 2010) & (df['party'] == 'INC')
m2 = (df.state=='haryana') & (df['year'] == 2015) & (df['party'] == 'BJP')
m = m1 | m2

Or:

m = (df.state=='haryana') & ((df['year'] == 2010) & (df['party'] == 'INC') |
                             (df['year'] == 2015) & (df['party'] == 'BJP'))

df.loc[m, 'party'] = 'major'

EDIT: You can check ouput of masks if working well:

m1=(df['STATE']=='BIHAR') & (df['YEAR']==2010) & ((df['PARTY']=='BJP')|(df['PARTY']=='JD(U)')) 
m2=(df['STATE']=='BIHAR') & (df['YEAR']==2015) & ((df['PARTY']=='RJD')|(df['PARTY']=='JD(U)')) 
m3=(df['STATE']=='BIHAR') & (df['YEAR']==2020) & ((df['PARTY']=='BJP')|(df['PARTY']=='JD(U)')) m=m1|m2|m3 df.loc[m, 'PARTY']= 'MAJOR'

print (df.assign(m1=m1, m2=m2, m3=m3,triple= m1 | m2 | m3,
                BIHAR = (df['STATE']=='BIHAR'),
                Y2010 = (df['YEAR']==2010)))
  • Related