Dividing values in columns based on their previous marker-CodePudding

I have the following dataframe:

df = {'id': [1,2,3,4],
    '1': ['Green', 'Green', 'Green', 'Green'],
  '2': ['34','67', 'Blue', '77'],
  '3': ['Blue', '45', '99', 'Blue'],
  '4': ['12', None, '101', '56'],
     '5': [None, None, None, '23']}

df = pd.DataFrame(df)


    id  1       2     3     4       5
0   1   Green   34    Blue  12      None
1   2   Green   67    45    None    None
2   3   Green   Blue  99    101     None
3   4   Green   77    Blue  56      23

I would like to creating separate dataframes for GREEN and BLUE with their id number. For example the output of the GREEN DF would be:

  id   number
   1     34
   2     67
   2     45
   4     77

CodePudding user response：

You can try:

# set id as index
df = df.set_index('id')

# mask the color
mask = df.isin(['Green','Blue'])

# label the cells' color with `where` and `ffill`
# print `df.where(mask).ffill(axis=1)` if you want to know how it looks
out = pd.DataFrame({'color':df.where(mask).ffill(axis=1).stack(),
                    'value':df.mask(mask).stack()
                  }).dropna()

# blue color
out.query('color=="Blue"')

Output:

     color value
id              
1  4  Blue    12
3  3  Blue    99
   4  Blue   101
4  4  Blue    56
   5  Blue    23

Note first level index is the old column, you can drop it if you don't want to.

CodePudding user response：

Check

df.set_index('id',inplace=True)

out = df.where(df.where(lambda x : x.isin(['Green','Blue'])).ffill(axis=1).eq('Green')).stack().loc[lambda x : x!='Green']
Out[571]: 
id   
1   2    34
2   2    67
    3    45
4   2    77
dtype: object

CodePudding user response：

Here is one approach, by stacking first to Series:

# set id as index stack to Series (removing the NaNs)
s = df.set_index('id').stack()

# which values are not Green/Blue?
m = ~s.isin(['Green', 'Blue'])

# ffill and select the non Green/Blue positions
df2 = pd.DataFrame({'color': s.mask(m).ffill().loc[m],
                    'number': s.loc[m]},
                   ).droplevel(1) # keep level if you want the column labels

output:

    color number
id              
1   Green     34
1    Blue     12
2   Green     67
2   Green     45
3    Blue     99
3    Blue    101
4   Green     77
4    Blue     56
4    Blue     23

CodePudding user response：

I'd create a dictionary of dataframes with the key of colors like this:

dfm =  df.melt('id').sort_values(['id', 'variable'])[['value']]

dfm['group'] = dfm.where(dfm.isin(['Green', 'Blue'])).ffill()['value']

c_dict = dict(tuple(dfm.dropna(how='any').query('value != group').groupby('group')[['value']]))

Output:

#c_dict['Blue']
   value
12    12
10    99
14   101
15    56
19    23

and

#c_dict['Green']
  value
4    34
5    67
9    45
7    77