Home > front end >  Is there a way to access values in dataframe based on series of column names?
Is there a way to access values in dataframe based on series of column names?

Time:10-20

I have a pandas dataframe that consists of a columns of values and a pandas series that consists of column names. I need to get the value of the nth row of the column corresponding to the nth index of the column name in the series. Note that the columna name is constructed by appending col to the value in the series. I have looked to see if there is a way to use a fast (vectorization or list comprehension) way of doing this but seem to hit a roadblock where I index into the dataframe using the index position of the series.

dataframe : {
'col1': [1, 2, 3, 4, 5],
'col2': [10, 20, 30, 40, 50],
'col3': [100, 200, 300, 400, 500]
}

series : [
'1', '2', '1', '3', '8'
]

output is a series : [
1, 20, 3, 400, numpy.nan
]

I am able to do this using a straightforward iterrows, but would like something faster (preferably vectorization, but if not list comprehension).

def test_cols():
    stub_data_df = pd.DataFrame({
        'col1': [1, 2, 3, 4, 5],
        'col2': [10, 20, 30, 40, 50],
        'col3': [100, 200, 300, 400, 500]
    })
    cols = pd.Series([
        '1', '2', '1', '3', '8'
    ])
    rates = []
    for i, row in stub_data_df.iterrows():
        rates.append(row.get('col'   cols[i]))
    print(pd.Series(rates))

output :

0      1.0
1     20.0
2      3.0
3    400.0
4      NaN
dtype: float64

CodePudding user response:

Here is a way to do this with a list comprehension:

df = pd.DataFrame({'col1': [1, 2, 3, 4, 5],
                   'col2': [10, 20, 30, 40, 50],
                   'col3': [100, 200, 300, 400, 500]})
s = pd.Series(['1', '2', '1', '3', '8'])

s = s.astype(int) - 1  # so these values can be used for integer indexing
result = s.copy()

legal_ix = s < len(df.columns)  # only columns that exist can be indexed
s = s[legal_ix] 

result[legal_ix] = [df.iloc[i, j] for i, j in zip(s.index, s.values)]
result[~legal_ix] = np.nan

print(result)
0      1.0
1     20.0
2      3.0
3    400.0
4      NaN
dtype: float64

CodePudding user response:

the docs have an example related to this:

 idx, cols = ('col'   cols).factorize()

array = stub_data_df.reindex(cols, axis = 1).to_numpy()
array = array[np.arange(len(stub_data_df)), idx]

pd.Series(array)

0      1.0
1     20.0
2      3.0
3    400.0
4      NaN
dtype: float64
  • Related