cols_df
represents the chunk of the DataFrame in which I want to be able to perform multiple operations but, each time, target two columns along with the first one. I sort the selected columns from small to large in each round (ex. '0', '2', '3' columns in first round, and '0', '4', '5' columns in second round). In a new column, I mark each row with an X if it does not contain numerical values in one of the two targeted columns. I go ahead applying this for each pair of cols_df
's columns. Then, I will have a DataFrame containing the newly marked column along with all the other columns.
Input:
import pandas as pd
cols_dict = {'matr': {0: '18I1', 1: '03I2', 2: '03I3', 3: '18I4', 4: '03I5', 5: '03I6', 6: '03I7', 7: '03I8', 8: '18I9', 9: '18I0'}, 'cat': {0: '3', 1: '3', 2: '3', 3: '3', 4: '3', 5: '18', 6: '3', 7: '3', 8: '3', 9: '3'}, 'Unnamed: 5': {0: 81, 1: 81, 2: 81, 3: 77, 4: None, 5: None, 6: 83, 7: 81, 8: 79, 9: 81}, 'Unnamed: 6': {0: 91, 1: 97, 2: 97, 3: 91, 4: None, 5: 93, 6: 89, 7: 83, 8: 81, 9: 99}, 'Unnamed: 7': {0: 117.0, 1: 115.0, 2: 115.0, 3: 115.0, 4: 115.0, 5: None, 6: 115.0, 7: 115.0, 8: 115.0, 9: 115.0}, 'Unnamed: 8': {0: 123.0, 1: 115.0, 2: 115.0, 3: 115.0, 4: 123.0, 5: 123.0, 6: 125.0, 7: 123.0, 8: 117.0, 9: None}}
cols_df = pd.DataFrame.from_dict(cols_dict)
The desired outout:
cols_dict_out = {'matr': {0: '18I1', 1: '03I2', 2: '03I3', 3: '18I4', 4: '03I5', 5: '03I6', 6: '03I7', 7: '03I8', 8: '18I9', 9: '18I0'}, 'xs': {0: None, 1: None, 2: None, 3: None, 4: None, 5: 'X', 6: None, 7: None, 8: None, 9: 'X'}, 'cat': {0: '3', 1: '3', 2: '3', 3: '3', 4: '3', 5: '18', 6: '3', 7: '3', 8: '3', 9: '3'}, 'Unnamed: 5': {0: 81, 1: 81, 2: 81, 3: 77, 4: None, 5: None, 6: 83, 7: 81, 8: 79, 9: 81}, 'Unnamed: 6': {0: 91, 1: 97, 2: 97, 3: 91, 4: None, 5: 93, 6: 89, 7: 83, 8: 81, 9: 99}, 'Unnamed: 7': {0: 117.0, 1: 115.0, 2: 115.0, 3: 115.0, 4: 115.0, 5: None, 6: 115.0, 7: 115.0, 8: 115.0, 9: 115.0}, 'Unnamed: 8': {0: 123.0, 1: 115.0, 2: 115.0, 3: 115.0, 4: 123.0, 5: 123.0, 6: 125.0, 7: 123.0, 8: 117.0, 9: None}}
cols_out_df = pd.DataFrame.from_dict(cols_dict_out)
CodePudding user response:
updated answer
pandas
# get columns 2/3 or 4/5 and check if NaN
m1 = cols_df.iloc[:, [2,3]].isna()
m2 = cols_df.iloc[:, [4,5]].isna()
# check if all values are homogeneous
m = ( m1.eq(m1.iloc[:, 0], axis=0).all(axis=1)
& m2.eq(m2.iloc[:, 0], axis=0).all(axis=1)
)
cols_df['xs'] = np.where(m, None, 'X')
generic answer with numpy:
N = 2
a = cols_df.filter(like='Unnamed').isna().to_numpy()
b = a.reshape((a.shape[0], N, -1,))
m = (b == b[..., [0]]).all(-1).all(1)
cols_df.insert(1, 'xs', np.where(m, None, 'X'))
Output:
matr xs cat Unnamed: 5 Unnamed: 6 Unnamed: 7 Unnamed: 8
0 18I1 None 3 81.0 91.0 117.0 123.0
1 03I2 None 3 81.0 97.0 115.0 115.0
2 03I3 None 3 81.0 97.0 115.0 115.0
3 18I4 None 3 77.0 91.0 115.0 115.0
4 03I5 None 3 NaN NaN 115.0 123.0
5 03I6 X 18 NaN 93.0 NaN 123.0
6 03I7 None 3 83.0 89.0 115.0 125.0
7 03I8 None 3 81.0 83.0 115.0 123.0
8 18I9 None 3 79.0 81.0 115.0 117.0
9 18I0 X 3 81.0 99.0 115.0 NaN
original answer
The logic is no fully clear, but it looks like you might want:
cols_df['xs'] = np.where(cols_df.filter(like='Unnamed').isna().any(axis=1), 'X', None)
Or, if you want to insert as second column:
cols_df.insert(1, 'xs', np.where(cols_df.filter(like='Unnamed').isna().any(axis=1), 'X', None))
Output:
matr xs cat Unnamed: 5 Unnamed: 6 Unnamed: 7 Unnamed: 8
0 18I1 None 3 81.0 91 117.0 123.0
1 03I2 None 3 81.0 97 115.0 115.0
2 03I3 None 3 81.0 97 115.0 115.0
3 18I4 None 3 77.0 91 115.0 115.0
4 03I5 X 3 NaN 81 115.0 123.0
5 03I6 X 18 NaN 93 NaN 123.0
6 03I7 None 3 83.0 89 115.0 125.0
7 03I8 None 3 81.0 83 115.0 123.0
8 18I9 None 3 79.0 81 115.0 117.0
9 18I0 X 3 81.0 99 115.0 NaN