I'm having trouble describing exactly what I want to achieve. I've tried looking here on stack to find others with the same problem, but are unable to find any. So I will try to describe exactly what I want and give you a sample setup code.
I would like to have a function that gives me a new column/pd.Series. This new column has boolean TRUE values (or int's) that are based on a certain condition.
The condition being as follows. There are N number of columns (example is 8), each with the same name but ending with one new number. IE, column_1, column_2 etc. The function I need is twofold:
- If N is given, look for/through each column row and see if it and the next N columns row are also TRUE/1 ..
- If N is NOT given, look for each column row and if all next columns rows are also TRUE/1, with the numbers as ID's to look at the column.
def get_df_series(df: pd.DataFrame, columns_ids: list, n: int=8) -> pd.Dataframe:
for i in columns_ids:
# missing code here .. i dont know if this would be the way to go
pass
return df
def create_dataframe(numbers: list) -> pd.DataFrame:
df = pd.DataFrame() # empty df
# create a column for each number with the number as ID and with random boolean values as int's
for i in numbers:
df[f'column_{i}'] = np.random.randint(2, size=20)
return df
if __name__=="__main__":
numbers = [1, 2, 3, 4, 5, 6, 7, 8]
df = create_dataframe(numbers=numbers)
df = get_df_series(df=df, numbers=numbers, n=3)
I have some experience with Pandas dataframes and know how to create IF/ELSE things with np.select for example.
(function) select(condlist: Sequence[ArrayLike], choicelist: Sequence[ArrayLike], default: ArrayLike = ...) -> NDArray
The problem I'm running into is that I don't know how to make a conditional statement if I don't know how many columns are ahead. For example, if I want to know for column_5 if the next 3 are also true, I can hardcode this, but I have columns up to id 20 and would love to not have to hardcode everything from column_1 to column_20 if I want to know if all conditions in all those columns are true.
Now the problem is that I don't know if this is even possible. So any feedback would be appreaciated. Even just giving me a hint on where to look for a way to do this.
EDIT: What I forgot to mention was that there will be random columns in between that obviously cannot be taking into the equation. For example, there will be main_column_1, main_column_2, main_column_3, side_column_1, side_column_2, right_column_1, main_column_3, main_column_4 etc...
The answer Corralien gave is correct, but I should've made my question more clearer.
I need to be able to, say, look at main_column and for that one look ahead N amount of columns of the same type: main_column.
CodePudding user response:
Try:
n = 3
out = (df.rolling(n, min_periods=1, axis=1).sum()
.shift(-n 1, fill_value=0, axis=1).eq(n).astype(int)
.rename(columns=lambda x: 'result_' x.split('_')[1]))
Output:
>>> out
result_1 result_2 result_3 result_4 result_5 result_6 result_7 result_8
0 1 1 1 1 1 1 0 0
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
4 0 0 0 1 0 0 0 0
5 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0
8 0 1 1 1 0 0 0 0
9 0 0 0 0 0 1 0 0
10 0 0 0 0 0 0 0 0
11 0 0 0 0 1 0 0 0
12 0 0 0 0 0 0 0 0
13 0 0 0 1 1 0 0 0
14 0 0 0 0 0 1 0 0
15 0 0 0 0 0 0 0 0
16 0 0 0 0 0 0 0 0
17 0 0 1 0 0 0 0 0
18 0 0 1 0 0 0 0 0
19 0 0 0 0 0 0 0 0
Input:
>>> df
column_1 column_2 column_3 column_4 column_5 column_6 column_7 column_8
0 1 1 1 1 1 1 1 1
1 0 1 0 0 0 1 1 0
2 1 1 0 1 0 1 1 0
3 1 0 1 0 0 0 0 0
4 1 0 0 1 1 1 0 1
5 1 1 0 1 0 1 1 0
6 1 0 1 0 0 0 0 1
7 0 0 1 0 0 0 0 0
8 0 1 1 1 1 1 0 0
9 1 0 1 1 0 1 1 1
10 0 0 1 1 0 0 1 1
11 1 0 1 0 1 1 1 0
12 0 1 1 0 1 0 1 0
13 0 0 0 1 1 1 1 0
14 0 0 1 1 0 1 1 1
15 1 0 0 1 0 1 0 0
16 1 0 0 0 0 0 0 1
17 0 0 1 1 1 0 0 1
18 0 0 1 1 1 0 0 1
19 0 0 1 0 0 0 1 0