Home > database >  selecting data based on index values
selecting data based on index values

Time:07-11

I have three column data saved in inp.dat

1.0   2.0   5.0 
2.0   3.0   6.0 
3.0   4.0   8.0 
4.0   1.0   7.0 
5.0   2.0   8.0 
5.0   2.0   8.0
1.0   2.0   5.0 
2.0   3.0   6.0

Additionally,A particular index value is provided for each column i.e. for column 1 index value is 3, for column 2 index value is 4, and for column 3 index value is 4 which is depicted as index_value=[3,4,4]. I want to select data 2 index values before and 2 index values after the given index values in the list, and all others values to be zero.

The expected output should be saved as file.out as shown below.

0.0   0.0   0.0 
2.0   0.0   0.0 
3.0   4.0   8.0 
4.0   1.0   7.0 
5.0   2.0   8.0 
5.0   2.0   8.0
0.0   2.0   5.0 
0.0   0.0   0.0

My code:

import numpy as np
import pandas as pd
data=np.loadtxt("inp.dat")
print(data.shape)

index_value=[3,4,4]

for i,data in enumerate(data):
    print(i,data)
    data=data[index_value[0]-2:index_value[0] 2]
np.savetxt('file.out',data)

I am not getting expected output using my trial code.Moreover I want to apply it to many columns of data in future. As I am a beginner I hope experts may help me overcoming this problem. Thanks in advance.

CodePudding user response:

You could apply your task column by column while looping through your index_list at the same time with zip. Then use a mask to set several values to 0.

import pandas as pd

df = pd.read_csv('data.dat',header=None, sep='\s ')
#this is only how I read the data to get the same example data you showed us
print(df)

     0    1    2
0  1.0  2.0  5.0
1  2.0  3.0  6.0
2  3.0  4.0  8.0
3  4.0  1.0  7.0
4  5.0  2.0  8.0
5  5.0  2.0  8.0
6  1.0  2.0  5.0
7  2.0  3.0  6.0

index_list = [3, 4, 4]

for target_idx, col in zip(index_list, df.columns):
    
    mask = (df.index >= target_idx-2) & (df.index < target_idx   3)
    # for the first column mask looks like this:
    # [False  True  True  True  True  True False False]
    
    df.loc[~mask, col] = 0 # set all values NOT in the mask to 0

print(df)

     0    1    2
0  0.0  0.0  0.0
1  2.0  0.0  0.0
2  3.0  4.0  8.0
3  4.0  1.0  7.0
4  5.0  2.0  8.0
5  5.0  2.0  8.0
6  0.0  2.0  5.0
7  0.0  0.0  0.0

#If you like to save it:
df.to_csv('file.out',header=False, index=False, sep='\t')
  • Related