Say I have a DataFrame pd with a column called 'elements' which contains a list of a list of objects as shown below:
print(df2['elements'])
0 [Element B, Element Cr, Element Re]
1 [Element B, Element Rh, Element Sc]
2 [Element B, Element Mo, Element Y]
3 [Element Al, Element B, Element Lu]
4 [Element B, Element Dy, Element Os]
Name: elements, Length: 1763, dtype: object
I would like to count how many times each string within the whole column, so like in the example above the count for the string 'Element B' is 5 or for 'Element Mo' is 1.
I have tried setting up a dictionary below, but this just counts the each list instead of the strings that are in them.
elements_count_dict = {}
for entry in df2['elements']:
for object in entry:
if object in elements_count_dict:
elements_count_dict[object] = 1
else:
elements_count_dict[object] = 0
However, doing it this way the dictionary of tracks each individual character instead of the strings i.e. [ = 5 and ] = 5 and even after converting the column to string using df2['elements']to_string()
it still doesn't work.
CodePudding user response:
Try as follows:
- First, use
Series.replace
to replace[
and]
in your strings with''
(regex:r\[|\]
). - Second, use
Series.str.split
to split the string on,
(i.e.,\s
). - Third, use
Series.explode
to put each item on its own row. - Finally, apply
Series.value_counts
to return apd.Series
with a count for each item in your lists (in order of appearance).
import pandas as pd
data = {'elements': {0: '[Element B, Element Cr, Element Re]',
1: '[Element B, Element Rh, Element Sc]',
2: '[Element B, Element Mo, Element Y]',
3: '[Element Al, Element B, Element Lu]',
4: '[Element B, Element Dy, Element Os]'}}
df = pd.DataFrame(data)
counts = df.elements.str.replace(r'\[|\]','', regex=True)\
.str.split(',\s').explode().value_counts()
print(counts)
Element B 5
Element Cr 1
Element Re 1
Element Rh 1
Element Sc 1
Element Mo 1
Element Y 1
Element Al 1
Element Lu 1
Element Dy 1
Element Os 1
Name: elements, dtype: int64
# to turn that into a dict, simply use:
# d = counts.to_dict()
CodePudding user response:
First use np.ravel
to flatten the nested list coming from df.elements.to_list
and then use collections.Counter
instead of a loop:
import numpy as np
from collections import Counter
ravel = np.ravel(df.elements.to_list())
Counter(ravel)
CodePudding user response:
Here is one way to do it
may not be very elegant, but works
df['col1'].str.replace(r'\[|\]','', regex=True).str.split(',').explode().str.strip().to_frame().groupby('col1').value_counts()
col1
Element Al 1
Element B 5
Element Cr 1
Element Dy 1
Element Lu 1
Element Mo 1
Element Os 1
Element Re 1
Element Rh 1
Element Sc 1
Element Y 1
dtype: int64
Data Used
data={'col1': {0: '[Element B, Element Cr, Element Re]',
1: '[Element B, Element Rh, Element Sc]',
2: '[Element B, Element Mo, Element Y]',
3: '[Element Al, Element B, Element Lu]',
4: '[Element B, Element Dy, Element Os]'}}
df=pd.DataFrame(data)
df