Home > database >  How to Assign muitiple coma separated values in a pandas data frame single column to separate but re
How to Assign muitiple coma separated values in a pandas data frame single column to separate but re

Time:12-30

I have a simple pandas data frame where 'SM_platform' column contains multiple coma separated values like 1,2,7. Now I want to assign these values to separate related columns in the data frame. ex : 1 should be added to column name FB, 2 to Twitter, 3 to Youtube.. etc ...Please advice how to do this task.
Appreciate your help. Thank you

   Age  SM_Platform
0   3   1, 2, 3, 7
1   3   1, 2, 3, 5, 7
2   1   1, 2, 3, 4
3   2   1, 2, 3, 4
4   1   1, 2

Update ------------

With @Corralien answer I get following error

  KeyError                                  Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3360             try:
-> 3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:

~\Anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

~\Anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'SM_Platform'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-7-b1a706d03e0a> in <module>
      4 
      5 out = df.join(
----> 6     df.pop('SM_Platform').str.split(', ').explode().astype(int)
      7       .replace(m).reset_index().assign(dummy=1)
      8       .pivot_table('dummy', 'index', 'SM_Platform', fill_value=0))

~\Anaconda3\lib\site-packages\pandas\core\frame.py in pop(self, item)
   5224         3  monkey        NaN
   5225         """
-> 5226         return super().pop(item=item)
   5227 
   5228     @doc(NDFrame.replace, **_shared_doc_kwargs)

~\Anaconda3\lib\site-packages\pandas\core\generic.py in pop(self, item)
    868 
    869     def pop(self, item: Hashable) -> Series | Any:
--> 870         result = self[item]
    871         del self[item]
    872 

~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   3456             if self.columns.nlevels > 1:
   3457                 return self._getitem_multilevel(key)
-> 3458             indexer = self.columns.get_loc(key)
   3459             if is_integer(indexer):
   3460                 indexer = [indexer]

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:
-> 3363                 raise KeyError(key) from err
   3364 
   3365         if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: 'SM_Platform'

any help please

CodePudding user response:

Using str.get_dummies, and borrowing the labels from Corralien's answer...

labels = {'1': 'Facebook',
          '2': 'Twitter',
          '3': 'Youtube',
          '4': 'Linkedin',
          '5': 'Instagram',
          '6': 'Pinterest',
          '7': 'TikTok'}

df = pd.concat([df, df['SM_Platform'].str.get_dummies(', ').rename(columns=labels)], axis=1)

   Age    SM_Platform  Facebook  Twitter  Youtube  Linkedin  Instagram  TikTok
0    3     1, 2, 3, 7         1        1        1         0          0       1
1    3  1, 2, 3, 5, 7         1        1        1         0          1       1
2    1     1, 2, 3, 4         1        1        1         1          0       0
3    2     1, 2, 3, 4         1        1        1         1          0       0
4    1           1, 2         1        1        0         0          0       0

CodePudding user response:

Create a mapping of your platforms (1 -> Facebook, 2 -> Twitter, etc) then explode your column SM_Platform before replace numeric values by corresponding names. Add a dummy column and pivot your dataframe:

l = ['Facebook', 'Twitter', 'Youtube', 'Linkedin',
     'Instagram', 'Pinterest', 'TikTok']
m = dict(enumerate(l, 1))

out = df.join(
    df['SM_Platform'].str.findall(r'\d ').explode().astype(int)
        .replace(m).reset_index().assign(dummy=1)
        .pivot_table('dummy', 'index', 'SM_Platform', fill_value=0)
)

Output:

>>> out
   Age    SM_Platform  Facebook  Instagram  Linkedin  TikTok  Twitter  Youtube
0    3     1, 2, 3, 7         1          0         0       1        1        1
1    3  1, 2, 3, 5, 7         1          1         0       1        1        1
2    1     1, 2, 3, 4         1          0         1       0        1        1
3    2     1, 2, 3, 4         1          0         1       0        1        1
4    1           1, 2         1          0         0       0        1        0

>>> m
{1: 'Facebook',
 2: 'Twitter',
 3: 'Youtube',
 4: 'Linkedin',
 5: 'Instagram',
 6: 'Pinterest',
 7: 'TikTok'}

Update

"ValueError: invalid literal for int() with base 10: '2,\xa03'"

It seems you have invisible white space '\xa0' so it's probably better to extract all the digits.

  • Related