I looked into existing threads regarding indexing, none of said threads address the present use case.
I would like to alter specific values in a DataFrame
based on their position therein, ie., I'd like the values in the second column from the first to the 4th row to be NaN
and values in the third column, first and second row to be NaN
say we have the following `DataFrame`:
df = pd.DataFrame(np.random.standard_normal((7,3)))
print(df)
0 1 2
0 -1.102888 1.293658 -2.290175
1 -1.826924 -0.661667 -1.067578
2 1.015479 0.058240 -0.228613
3 -0.760368 0.256324 -0.259946
4 0.496348 0.437496 0.646149
5 0.717212 0.481687 -2.640917
6 -0.141584 -1.997986 1.226350
And I want alter df
like below with the least amount of code:
0 1 2
0 -1.102888 NaN NaN
1 -1.826924 NaN NaN
2 1.015479 NaN -0.228613
3 -0.760368 NaN -0.259946
4 0.496348 0.437496 0.646149
5 0.717212 0.481687 -2.640917
6 -0.141584 -1.997986 1.226350
I tried using boolean indexing with .loc
but resulted in an error:
df.loc[(:2,1:) & (2:4,1)] = np.nan
# exception message:
df.loc[(:2,1:) & (2:4,1)] = np.nan
^
SyntaxError: invalid syntax
I also thought about converting the DataFrame
object to a numpy narray
object but then I wouldn't know how to use boolean in that case.
CodePudding user response:
One way is define the requirement and assign to be clear:
d = {1:4,2:2}
for col,val in d.items():
df.iloc[:val,col] = np.nan
print(df)
0 1 2
0 -1.102888 NaN NaN
1 -1.826924 NaN NaN
2 1.015479 NaN -0.228613
3 -0.760368 NaN -0.259946
4 0.496348 0.437496 0.646149
5 0.717212 0.481687 -2.640917
6 -0.141584 -1.997986 1.226350