Home > Software design >  How to create a subset of dataframe in python?
How to create a subset of dataframe in python?

Time:04-07

I have a large dataset(pandes dataframe) with following headers
RAM = [f"RUT1_Azi_{i}" for i in range(10)]
RDP = [f"RUT1_Dtctn_Probb_{i}" for i in range(´10)]
RDI = [f"RUT1_Dtctn_ID_{i}" for i in range(10)]
REM = [f"RUT1_Elev_{i}" for i in range(10)]
RCC = ['RUT1_Cycle_Counter']

Now i want to make many subset from the original dataframe as below.

subset_0
index,RUT1_Cycle_Counter, RUT1_Azi_0, RUT1_Dtctn_Probb_0, RUT1_Dtctn_ID_0, RUT1_Elev_0

subset_1
index,RUT1_Cycle_Counter, RUT1_Azi_1, RUT1_Dtctn_Probb_1, RUT1_Dtctn_ID_1, RUT1_Elev_1
.
.
.
subset_9
index,RUT1_Cycle_Counter, RUT1_Azi_9, RUT1_Dtctn_Probb_9, RUT1_Dtctn_ID_9, RUT1_Elev_9

How can I do this in python? i am a beginner in python

Thank you very much in advance

CodePudding user response:

Here is an example:

RAM = [f"RUT1_Azi_{i}" for i in range(10)]
RDP = [f"RUT1_Dtctn_Probb_{i}" for i in range(10)]
RDI = [f"RUT1_Dtctn_ID_{i}" for i in range(10)]
REM = [f"RUT1_Elev_{i}" for i in range(10)]

# made up example with the columns above
cols = RAM   RDP   RDI   REM
nrows = 10
df = pd.DataFrame(np.arange(nrows * len(cols)).reshape(nrows, -1), columns=cols)

Now:

subsets = [df[list(subcols)] for subcols in zip(RAM, RDP, RDI, REM)]

For example:

>>> subsets[5]
   RUT1_Azi_5  RUT1_Dtctn_Probb_5  RUT1_Dtctn_ID_5  RUT1_Elev_5
0           5                  15               25           35
1          45                  55               65           75
2          85                  95              105          115
3         125                 135              145          155
4         165                 175              185          195
5         205                 215              225          235
6         245                 255              265          275
7         285                 295              305          315
8         325                 335              345          355
9         365                 375              385          395

Edit: modified answer to include a common list of columns for all subsets (RCC = ['RUT1_Cycle_Counter']):

subsets = [df[RCC   list(subcols)] for subcols in zip(RAM, RDP, RDI, REM)]

CodePudding user response:

With pandas you can natively call a subset of a dataframe as long as list_of_subset_headers is a subset of your dataframes columns just write

sub_df=df[list_of_subset_headers]

Or in this case :

sub_df0=df[['RUT1_Azi_0', 'RUT1_Dtctn_Probb_0', 'RUT1_Dtctn_ID_0', 'RUT1_Elev_0']]
  • Related