I have the following dataframe:
df = {'id': [1,2,3,4],
'1': ['Green', 'Green', 'Green', 'Green'],
'2': ['34','67', 'Blue', '77'],
'3': ['Blue', '45', '99', 'Blue'],
'4': ['12', None, '101', '56'],
'5': [None, None, None, '23']}
df = pd.DataFrame(df)
id 1 2 3 4 5
0 1 Green 34 Blue 12 None
1 2 Green 67 45 None None
2 3 Green Blue 99 101 None
3 4 Green 77 Blue 56 23
I would like to creating separate dataframes for GREEN and BLUE with their id number. For example the output of the GREEN DF would be:
id number
1 34
2 67
2 45
4 77
CodePudding user response:
You can try:
# set id as index
df = df.set_index('id')
# mask the color
mask = df.isin(['Green','Blue'])
# label the cells' color with `where` and `ffill`
# print `df.where(mask).ffill(axis=1)` if you want to know how it looks
out = pd.DataFrame({'color':df.where(mask).ffill(axis=1).stack(),
'value':df.mask(mask).stack()
}).dropna()
# blue color
out.query('color=="Blue"')
Output:
color value
id
1 4 Blue 12
3 3 Blue 99
4 Blue 101
4 4 Blue 56
5 Blue 23
Note first level index is the old column, you can drop it if you don't want to.
CodePudding user response:
Check
df.set_index('id',inplace=True)
out = df.where(df.where(lambda x : x.isin(['Green','Blue'])).ffill(axis=1).eq('Green')).stack().loc[lambda x : x!='Green']
Out[571]:
id
1 2 34
2 2 67
3 45
4 2 77
dtype: object
CodePudding user response:
Here is one approach, by stacking first to Series:
# set id as index stack to Series (removing the NaNs)
s = df.set_index('id').stack()
# which values are not Green/Blue?
m = ~s.isin(['Green', 'Blue'])
# ffill and select the non Green/Blue positions
df2 = pd.DataFrame({'color': s.mask(m).ffill().loc[m],
'number': s.loc[m]},
).droplevel(1) # keep level if you want the column labels
output:
color number
id
1 Green 34
1 Blue 12
2 Green 67
2 Green 45
3 Blue 99
3 Blue 101
4 Green 77
4 Blue 56
4 Blue 23
CodePudding user response:
I'd create a dictionary of dataframes with the key of colors like this:
dfm = df.melt('id').sort_values(['id', 'variable'])[['value']]
dfm['group'] = dfm.where(dfm.isin(['Green', 'Blue'])).ffill()['value']
c_dict = dict(tuple(dfm.dropna(how='any').query('value != group').groupby('group')[['value']]))
Output:
#c_dict['Blue']
value
12 12
10 99
14 101
15 56
19 23
and
#c_dict['Green']
value
4 34
5 67
9 45
7 77