Home > other >  Replace or change position in columns and sort the index Python Pandas
Replace or change position in columns and sort the index Python Pandas

Time:10-08

I'm having troubles with the sorting of my counts generating my df so basically I'm running a for to use value_counts in each column but the problem comes when the is a few colums in the new df that are not sorted in a single way so there is columns like this:

enter image description here

I would like to have just one order for this columns either way yes, no don't know or any order but the same for the 3 columns and also they share the 3 options just because it doesn't appear any value with Don't know doest appear in the first 2 columns yes,no so I would like to add Don't know at the end, copy or replace the other 2 columns with the same options as column 3. I tried to explai it the best I could, this is my code:

li = []

for i in range(0, len(df.columns)):
    value_counts = df.iloc[:, i].value_counts().to_frame().reset_index()
    li.append(value_counts)

CodePudding user response:

IIUC, Use sort_index():

value_counts = df.iloc[:, i].value_counts().sort_index().to_frame().reset_index()

CodePudding user response:

Use:

np.random.seed(202)
    
L = ['abdef','trasdfg','ssfgh','dfghj','jhgfdsa']
c = np.random.choice(L, size=4)
df = pd.DataFrame(np.random.choice(['yes','no'], size=(3, 4)), columns=c)
print (df)
  dfghj jhgfdsa jhgfdsa abdef
0    no      no      no   yes
1   yes      no      no   yes
2   yes     yes      no    no

If remove .to_frame().reset_index() get indices in final DataFrame same for all columns, ordering is not necessary change, because same for all values. If missing yes or no for some column is created NaNs:

li = []

for i in range(0, len(df.columns)):
    value_counts = df.iloc[:, i].value_counts()
    li.append(value_counts)
    
df = pd.concat(li, axis=1)
print (df)
     dfghj  jhgfdsa  jhgfdsa  abdef
yes      2        1      NaN      2
no       1        2      3.0      1

If need duplicated index values add Series.reindex by order like you need with list of all possible values in all columns:

li = []

for i in range(0, len(df.columns)):
    value_counts = df.iloc[:, i].value_counts().reindex(['yes','no']).reset_index()
    li.append(value_counts)
    
df = pd.concat(li, axis=1)
print (df)
  index  dfghj index  jhgfdsa index  jhgfdsa index  abdef
0   yes      2   yes        1   yes      NaN   yes      2
1    no      1    no        2    no      3.0    no      1
  • Related