Home > Back-end >  Pandas .iloc indexing coupled with boolean indexing in a Dataframe
Pandas .iloc indexing coupled with boolean indexing in a Dataframe

Time:01-29

I looked into existing threads regarding indexing, none of said threads address the present use case.

I would like to alter specific values in a DataFrame based on their position therein, ie., I'd like the values in the second column from the first to the 4th row to be NaN and values in the third column, first and second row to be NaN say we have the following `DataFrame`:

df = pd.DataFrame(np.random.standard_normal((7,3)))
print(df)
          0         1         2
0 -1.102888  1.293658 -2.290175
1 -1.826924 -0.661667 -1.067578
2  1.015479  0.058240 -0.228613
3 -0.760368  0.256324 -0.259946
4  0.496348  0.437496  0.646149
5  0.717212  0.481687 -2.640917
6 -0.141584 -1.997986  1.226350

And I want alter df like below with the least amount of code:

          0         1         2
0 -1.102888       NaN       NaN
1 -1.826924       NaN       NaN
2  1.015479       NaN -0.228613
3 -0.760368       NaN -0.259946
4  0.496348  0.437496  0.646149
5  0.717212  0.481687 -2.640917
6 -0.141584 -1.997986  1.226350

I tried using boolean indexing with .loc but resulted in an error:

df.loc[(:2,1:) & (2:4,1)] = np.nan

# exception message:
df.loc[(:2,1:) & (2:4,1)] = np.nan
            ^
SyntaxError: invalid syntax

I also thought about converting the DataFrame object to a numpy narray object but then I wouldn't know how to use boolean in that case.

CodePudding user response:

One way is define the requirement and assign to be clear:

d = {1:4,2:2}
for col,val in d.items():
    df.iloc[:val,col] = np.nan

print(df)

          0         1         2
0 -1.102888       NaN       NaN
1 -1.826924       NaN       NaN
2  1.015479       NaN -0.228613
3 -0.760368       NaN -0.259946
4  0.496348  0.437496  0.646149
5  0.717212  0.481687 -2.640917
6 -0.141584 -1.997986  1.226350
  • Related