I have a dataframe with a list of strings and I would like to add columns with the number of occurrences of a character, sorted with the maximum to minimum occurrences The datafrae is very big so I need an efficient way to calculate it
Originale df:
Item
0 ABABCBF
1 ABABCGH
2 ABABEFR
3 ABABFBF
4 ABACTC3
Wanted df:
Item o1 o2 o3 o4 o5
0 ABABCBF 3 2 1 1 null
1 ABABCGH 2 2 1 1 1
2 ABABEFR 2 2 1 1 1
3 ABABFBF 3 2 2 null null
4 ABACTC3 2 2 1 1 1
I have tried using collection counter but I am not able to convert the result in the column of the dataframe collections.Counter(df['item'])
Thanks
CodePudding user response:
You can use collections.Counter
and the DataFrame
constructor:
from collections import Counter
out = df.join(pd.DataFrame(
sorted(Counter(x).values(), reverse=True)
for x in df['Item'])
.rename(columns=lambda x: f'o{x 1}')
)
print(out)
Output:
Item o1 o2 o3 o4 o5
0 ABABCBF 3 2 1 1.0 NaN
1 ABABCGH 2 2 1 1.0 1.0
2 ABABEFR 2 2 1 1.0 1.0
3 ABABFBF 3 2 2 NaN NaN
4 ABACTC3 2 2 1 1.0 1.0
CodePudding user response:
Try:
import json
import pandas as pd
from collections import Counter
df = pd.DataFrame({'Item': ['ABACABDF', 'BACBDFHGAAAA']})
result = df.join(
pd.DataFrame(
json.loads(
df['Item']
.transform(lambda x: sorted(list(Counter(x).values()), reverse=True))
.to_json(orient='records')
)
)
.rename(columns=(lambda x: f'o{x 1}'))
)
result
Item o1 o2 o3 o4 o5 o6 o7
0 ABACABDF 3 2 1 1 1 NaN NaN
1 BACBDFHGAAAA 5 2 1 1 1 1.0 1.0
CodePudding user response:
try this:
def count_chars(txt: str):
ser = pd.Series([*txt])
result = ser.value_counts().tolist()
return result
result = df.join(
pd.DataFrame([*df['Item'].apply(count_chars)]).rename(columns=lambda x: f'o{x 1}'))
print(result)
>>>
Item o1 o2 o3 o4 o5
0 ABABCBF 3 2 1 1.0 NaN
1 ABABCGH 2 2 1 1.0 1.0
2 ABABEFR 2 2 1 1.0 1.0
3 ABABFBF 3 2 2 NaN NaN
4 ABACTC3 2 2 1 1.0 1.0