Home > database >  Condition-based subset extraction from both a Pandas Dataframe and a Numpy array
Condition-based subset extraction from both a Pandas Dataframe and a Numpy array

Time:09-21

I have a Pandas Dataframe df and a numpy array ar of the same size. I can extract rows from df like this:

subdf = df[df['column'] == value]

But how can I extract corresponding rows from ar, i.e. rows with the same indices?

In my case, df is also a subset of bigger Dataframe, meaning that df.index is not a set of consecutive integers.

CodePudding user response:

You can use:

df = pd.DataFrame({'value': [1,2,3,2,1]})
ar = np.array([10,20,30,20,10])
ar[df['value'] == 2]

output:

array([20, 20])

or, if you have higher dimensions:

ar = np.arange(20).reshape(4,5)
ar[:, df['value'] == 2]

input:

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

output:

array([[ 1,  3],
       [ 6,  8],
       [11, 13],
       [16, 18]])

CodePudding user response:

Not so clear but lets make an attempt

import pandas as pd
df = pd.DataFrame({
    'Date': ['2021-09-14','2021-09-14','2021-09-14','2021-09-13','2021-09-12','2021-09-12','2021-09-11'],
    'Date_Yesterday': ['2021-09-13','2021-09-13','2021-09-13','2021-09-12','2021-09-11','2021-09-11','2021-09-10'],
    'Clicks': [100,100,100,50,10,10,1]
})
df

array=df['Clicks'].values# Array

s=df.loc[3:4, 'Clicks'].index# define dataframe index range

or

s=df['Clicks'].isin([50,  10])

array[np.r_[s]]#array slice based on df

CodePudding user response:

There are a couple of ways to do this. Given the array and dataframe:

import numpy as np
import pandas as pd

arr = np.array(([21, 22, 23], [11, 22, 33], [21, 77, 89]))
df = pd.DataFrame(data=arr, columns=['c1', 'c2', 'c3'])

If you extract rows from the dataframe, you can use the index from that sub-dataframe to get the corresponding rows from the numpy array.

df2 = df[df['c1'] == 21]
arr1 = arr[df2.index]
print(arr1)

Output:

[[21 22 23]
 [21 77 89]]

You can also directly use the same syntax that you used to get the sub-dataframe to get the rows from the array.

arr2 = arr[df['c2'] == 21]
print(arr2)

Output:

[[21 22 23]
 [21 77 89]]
  • Related