I have a dataframe like this:
import numpy as np
import pandas as pd
from collections import Counter
df = pd.DataFrame({'c0': ['app','e','i','owl','u'],'c1': ['p','app','i','g',''],'c2': ['g','p','app','owl','']})
df
c0 c1 c2
0 app p g
1 e app p
2 i i app
3 owl g owl
4 u
I would like to align the rows based on frequency of items.
Required dataframe with quantities:
c0 c1 c2
0 app app app
1 i i
2 owl owl
3 e p p
4 u g g
My attempt
all_cols = df.values.flatten()
all_cols = [i for i in all_cols if i]
freq = Counter(all_cols)
freq
CodePudding user response:
I can get you this far:
import pandas as pd
df = pd.DataFrame({'c0': list('aeiou'),'c1': ['p','a','i','g',''],'c2': ['g','p','a','o','']})
allLetters = set(x for x in df.to_numpy().flatten() if x)
binaryIncidence = []
for letter in allLetters:
binaryIncidence.append(tuple(int(letter in df[col].tolist()) for col in df.columns))
x = list(zip(allLetters, binaryIncidence))
x.sort(key=lambda y:(y[1], -ord(y[0])), reverse=True)
x = [[y[0] if b else '' for b in y[1]] for y in x]
df_results = pd.DataFrame(x, columns=df.columns)
print(df_results)
... with this output:
c0 c1 c2
0 a a a
1 i i
2 o o
3 e
4 u
5 g g
6 p p
However, in the sample output from your question, you show 'e' getting paired up with 'p', 'p', and also 'u' getting paired up with 'g', 'g'. It's not clear to me how this selection would be made.
UPDATE: generalize to strings of arbitrary length
This will work not just with strings of length <=1 but of arbitrary length:
import pandas as pd
df = pd.DataFrame({'c0': ['app','e','i','owl','u'],'c1': ['p','app','i','g',''],'c2': ['g','p','app','owl','']})
allStrings = set(x for x in df.to_numpy().flatten() if x)
binaryIncidence = []
for s in allStrings:
binaryIncidence.append(tuple(int(s in df[col].tolist()) for col in df.columns))
x = list(zip(allStrings, binaryIncidence))
x.sort(key=lambda y:(tuple(-b for b in y[1]), y[0]))
x = [[y[0] if b else '' for b in y[1]] for y in x]
df_results = pd.DataFrame(x, columns=df.columns)
print(df_results)
Output:
c0 c1 c2
0 app app app
1 i i
2 owl owl
3 e
4 u
5 g g
6 p p