Iterate over rows while targeting columns two at a time-CodePudding

cols_df represents the chunk of the DataFrame in which I want to be able to perform multiple operations but, each time, target two columns along with the first one. I sort the selected columns from small to large in each round (ex. '0', '2', '3' columns in first round, and '0', '4', '5' columns in second round). In a new column, I mark each row with an X if it does not contain numerical values in one of the two targeted columns. I go ahead applying this for each pair of cols_df's columns. Then, I will have a DataFrame containing the newly marked column along with all the other columns.

Input:

import pandas as pd

cols_dict = {'matr': {0: '18I1', 1: '03I2', 2: '03I3', 3: '18I4', 4: '03I5', 5: '03I6', 6: '03I7', 7: '03I8', 8: '18I9', 9: '18I0'}, 'cat': {0: '3', 1: '3', 2: '3', 3: '3', 4: '3', 5: '18', 6: '3', 7: '3', 8: '3', 9: '3'}, 'Unnamed: 5': {0: 81, 1: 81, 2: 81, 3: 77, 4: None, 5: None, 6: 83, 7: 81, 8: 79, 9: 81}, 'Unnamed: 6': {0: 91, 1: 97, 2: 97, 3: 91, 4: None, 5: 93, 6: 89, 7: 83, 8: 81, 9: 99}, 'Unnamed: 7': {0: 117.0, 1: 115.0, 2: 115.0, 3: 115.0, 4: 115.0, 5: None, 6: 115.0, 7: 115.0, 8: 115.0, 9: 115.0}, 'Unnamed: 8': {0: 123.0, 1: 115.0, 2: 115.0, 3: 115.0, 4: 123.0, 5: 123.0, 6: 125.0, 7: 123.0, 8: 117.0, 9: None}}  

cols_df = pd.DataFrame.from_dict(cols_dict)

The desired outout:

cols_dict_out = {'matr': {0: '18I1', 1: '03I2', 2: '03I3', 3: '18I4', 4: '03I5', 5: '03I6', 6: '03I7', 7: '03I8', 8: '18I9', 9: '18I0'}, 'xs': {0: None, 1: None, 2: None, 3: None, 4: None, 5: 'X', 6: None, 7: None, 8: None, 9: 'X'}, 'cat': {0: '3', 1: '3', 2: '3', 3: '3', 4: '3', 5: '18', 6: '3', 7: '3', 8: '3', 9: '3'}, 'Unnamed: 5': {0: 81, 1: 81, 2: 81, 3: 77, 4: None, 5: None, 6: 83, 7: 81, 8: 79, 9: 81}, 'Unnamed: 6': {0: 91, 1: 97, 2: 97, 3: 91, 4: None, 5: 93, 6: 89, 7: 83, 8: 81, 9: 99}, 'Unnamed: 7': {0: 117.0, 1: 115.0, 2: 115.0, 3: 115.0, 4: 115.0, 5: None, 6: 115.0, 7: 115.0, 8: 115.0, 9: 115.0}, 'Unnamed: 8': {0: 123.0, 1: 115.0, 2: 115.0, 3: 115.0, 4: 123.0, 5: 123.0, 6: 125.0, 7: 123.0, 8: 117.0, 9: None}}

cols_out_df = pd.DataFrame.from_dict(cols_dict_out)

CodePudding user response：

updated answer

pandas

# get columns 2/3 or 4/5 and check if NaN
m1 = cols_df.iloc[:, [2,3]].isna()
m2 = cols_df.iloc[:, [4,5]].isna()

# check if all values are homogeneous
m = ( m1.eq(m1.iloc[:, 0], axis=0).all(axis=1)
    & m2.eq(m2.iloc[:, 0], axis=0).all(axis=1)
    )

cols_df['xs'] = np.where(m, None, 'X')

generic answer with numpy:

N = 2
a = cols_df.filter(like='Unnamed').isna().to_numpy()
b = a.reshape((a.shape[0], N, -1,))
m = (b == b[..., [0]]).all(-1).all(1)

cols_df.insert(1, 'xs', np.where(m, None, 'X'))

Output:

   matr    xs cat  Unnamed: 5  Unnamed: 6  Unnamed: 7  Unnamed: 8
0  18I1  None   3        81.0        91.0       117.0       123.0
1  03I2  None   3        81.0        97.0       115.0       115.0
2  03I3  None   3        81.0        97.0       115.0       115.0
3  18I4  None   3        77.0        91.0       115.0       115.0
4  03I5  None   3         NaN         NaN       115.0       123.0
5  03I6     X  18         NaN        93.0         NaN       123.0
6  03I7  None   3        83.0        89.0       115.0       125.0
7  03I8  None   3        81.0        83.0       115.0       123.0
8  18I9  None   3        79.0        81.0       115.0       117.0
9  18I0     X   3        81.0        99.0       115.0         NaN

original answer

The logic is no fully clear, but it looks like you might want:

cols_df['xs'] = np.where(cols_df.filter(like='Unnamed').isna().any(axis=1), 'X', None)

Or, if you want to insert as second column:

cols_df.insert(1, 'xs', np.where(cols_df.filter(like='Unnamed').isna().any(axis=1), 'X', None))

Output:

   matr    xs cat  Unnamed: 5  Unnamed: 6  Unnamed: 7  Unnamed: 8
0  18I1  None   3        81.0          91       117.0       123.0
1  03I2  None   3        81.0          97       115.0       115.0
2  03I3  None   3        81.0          97       115.0       115.0
3  18I4  None   3        77.0          91       115.0       115.0
4  03I5     X   3         NaN          81       115.0       123.0
5  03I6     X  18         NaN          93         NaN       123.0
6  03I7  None   3        83.0          89       115.0       125.0
7  03I8  None   3        81.0          83       115.0       123.0
8  18I9  None   3        79.0          81       115.0       117.0
9  18I0     X   3        81.0          99       115.0         NaN