I have a simple pandas data frame where 'SM_platform' column contains multiple coma separated values like 1,2,7. Now I want to assign these values to separate related columns in the data frame.
ex : 1 should be added to column name FB,
2 to Twitter,
3 to Youtube.. etc ...Please advice how to do this task.
Appreciate your help. Thank you
Age SM_Platform
0 3 1, 2, 3, 7
1 3 1, 2, 3, 5, 7
2 1 1, 2, 3, 4
3 2 1, 2, 3, 4
4 1 1, 2
Update ------------
With @Corralien answer I get following error
KeyError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
~\Anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
~\Anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'SM_Platform'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-7-b1a706d03e0a> in <module>
4
5 out = df.join(
----> 6 df.pop('SM_Platform').str.split(', ').explode().astype(int)
7 .replace(m).reset_index().assign(dummy=1)
8 .pivot_table('dummy', 'index', 'SM_Platform', fill_value=0))
~\Anaconda3\lib\site-packages\pandas\core\frame.py in pop(self, item)
5224 3 monkey NaN
5225 """
-> 5226 return super().pop(item=item)
5227
5228 @doc(NDFrame.replace, **_shared_doc_kwargs)
~\Anaconda3\lib\site-packages\pandas\core\generic.py in pop(self, item)
868
869 def pop(self, item: Hashable) -> Series | Any:
--> 870 result = self[item]
871 del self[item]
872
~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
3456 if self.columns.nlevels > 1:
3457 return self._getitem_multilevel(key)
-> 3458 indexer = self.columns.get_loc(key)
3459 if is_integer(indexer):
3460 indexer = [indexer]
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: 'SM_Platform'
any help please
CodePudding user response:
Using str.get_dummies, and borrowing the labels from Corralien's answer...
labels = {'1': 'Facebook',
'2': 'Twitter',
'3': 'Youtube',
'4': 'Linkedin',
'5': 'Instagram',
'6': 'Pinterest',
'7': 'TikTok'}
df = pd.concat([df, df['SM_Platform'].str.get_dummies(', ').rename(columns=labels)], axis=1)
Age SM_Platform Facebook Twitter Youtube Linkedin Instagram TikTok
0 3 1, 2, 3, 7 1 1 1 0 0 1
1 3 1, 2, 3, 5, 7 1 1 1 0 1 1
2 1 1, 2, 3, 4 1 1 1 1 0 0
3 2 1, 2, 3, 4 1 1 1 1 0 0
4 1 1, 2 1 1 0 0 0 0
CodePudding user response:
Create a mapping of your platforms (1 -> Facebook, 2 -> Twitter, etc) then explode your column SM_Platform
before replace numeric values by corresponding names. Add a dummy
column and pivot your dataframe:
l = ['Facebook', 'Twitter', 'Youtube', 'Linkedin',
'Instagram', 'Pinterest', 'TikTok']
m = dict(enumerate(l, 1))
out = df.join(
df['SM_Platform'].str.findall(r'\d ').explode().astype(int)
.replace(m).reset_index().assign(dummy=1)
.pivot_table('dummy', 'index', 'SM_Platform', fill_value=0)
)
Output:
>>> out
Age SM_Platform Facebook Instagram Linkedin TikTok Twitter Youtube
0 3 1, 2, 3, 7 1 0 0 1 1 1
1 3 1, 2, 3, 5, 7 1 1 0 1 1 1
2 1 1, 2, 3, 4 1 0 1 0 1 1
3 2 1, 2, 3, 4 1 0 1 0 1 1
4 1 1, 2 1 0 0 0 1 0
>>> m
{1: 'Facebook',
2: 'Twitter',
3: 'Youtube',
4: 'Linkedin',
5: 'Instagram',
6: 'Pinterest',
7: 'TikTok'}
Update
"ValueError: invalid literal for int() with base 10: '2,\xa03'"
It seems you have invisible white space '\xa0'
so it's probably better to extract all the digits.