I have three
column data saved in inp.dat
1.0 2.0 5.0
2.0 3.0 6.0
3.0 4.0 8.0
4.0 1.0 7.0
5.0 2.0 8.0
5.0 2.0 8.0
1.0 2.0 5.0
2.0 3.0 6.0
Additionally,A particular index
value is provided for each column i.e. for column 1
index value is 3
, for column 2
index value is 4
, and for column 3
index value is 4
which is depicted as index_value=[3,4,4]
. I want to select data 2
index values before and 2
index values after the given index
values in the list
, and all others values to be zero
.
The expected output should be saved as file.out
as shown below.
0.0 0.0 0.0
2.0 0.0 0.0
3.0 4.0 8.0
4.0 1.0 7.0
5.0 2.0 8.0
5.0 2.0 8.0
0.0 2.0 5.0
0.0 0.0 0.0
My code:
import numpy as np
import pandas as pd
data=np.loadtxt("inp.dat")
print(data.shape)
index_value=[3,4,4]
for i,data in enumerate(data):
print(i,data)
data=data[index_value[0]-2:index_value[0] 2]
np.savetxt('file.out',data)
I am not getting expected output using my trial code.Moreover I want to apply it to many columns of data in future. As I am a beginner I hope experts may help me overcoming this problem. Thanks in advance.
CodePudding user response:
You could apply your task column by column while looping through your index_list
at the same time with zip
. Then use a mask to set several values to 0
.
import pandas as pd
df = pd.read_csv('data.dat',header=None, sep='\s ')
#this is only how I read the data to get the same example data you showed us
print(df)
0 1 2
0 1.0 2.0 5.0
1 2.0 3.0 6.0
2 3.0 4.0 8.0
3 4.0 1.0 7.0
4 5.0 2.0 8.0
5 5.0 2.0 8.0
6 1.0 2.0 5.0
7 2.0 3.0 6.0
index_list = [3, 4, 4]
for target_idx, col in zip(index_list, df.columns):
mask = (df.index >= target_idx-2) & (df.index < target_idx 3)
# for the first column mask looks like this:
# [False True True True True True False False]
df.loc[~mask, col] = 0 # set all values NOT in the mask to 0
print(df)
0 1 2
0 0.0 0.0 0.0
1 2.0 0.0 0.0
2 3.0 4.0 8.0
3 4.0 1.0 7.0
4 5.0 2.0 8.0
5 5.0 2.0 8.0
6 0.0 2.0 5.0
7 0.0 0.0 0.0
#If you like to save it:
df.to_csv('file.out',header=False, index=False, sep='\t')