Home > database >  How to update numpy array based on pandas dataframe
How to update numpy array based on pandas dataframe

Time:02-20

I have a numpy array with thousands of rows and columns, and I'm wondering how to update each value based on the values in a pandas dataframe.

For example, let's say my array contains a list of years (here's an incredibly small sample just to give you the basic idea):

[[2020, 2015, 2017],
[2015, 2016, 2016],
[2019, 2018, 2020]]

I want to change each value in the array to "Lat" based on the "Year". So if my pandas dataframe looks like this:

Year Lat Lon
2020 37.2 103.45
2019 46.1 107.82
2018 35.2 101.45
2017 38.6 110.62
2016 29.1 112.73
2015 33.8 120.92

Then the output array should look like:

[[37.2, 33.8, 38.6],
[33.8, 29.1, 29.1],
[46.1, 35.2, 37.2]]

If my dataset were truly this small, it wouldn't be a problem, but considering I have millions of values in the array and thousands of values in the dataframe, I'm a little overwhelmed on how to go about this efficiently. Any suggestions would be greatly appreciated!

CodePudding user response:

Maybe setting Year as index and using loc would help

# Data
arr = np.array([[2020, 2015, 2017], [2015, 2016, 2016], [2019, 2018, 2020]])
df = pd.DataFrame({'Year': {0: 2020, 1: 2019, 2: 2018, 3: 2017, 4: 2016, 5: 2015},
                   'Lat': {0: 37.2, 1: 46.1, 2: 35.2, 3: 38.6, 4: 29.1, 5: 33.8},
                   'Lon': {0: 103.45, 1: 107.82, 2: 101.45, 3: 110.62, 4: 112.73, 5: 120.92}})

df = df.set_index("Year")
np.array([[df.loc[year, "Lat"] for year in years] for years in arr])
# array([[37.2, 33.8, 38.6],
#        [33.8, 29.1, 29.1],
#        [46.1, 35.2, 37.2]])

CodePudding user response:

You're basically mapping values across columns. One idea is to use indexing to locate the elements that need to be replaced for a given key, then replace them all at once. This takes one iteration for each key-value pair in the original data.

Example:

import numpy as np
import pandas as pd

a = np.array([
    [2020, 2015, 2017],
    [2015, 2016, 2016],
    [2019, 2018, 2020],
])
b = np.zeros(a.shape, dtype=float)

df = pd.DataFrame({
    'Year': [2020, 2019, 2018, 2017, 2016, 2015],
    'Lat': [37.2, 46.1, 35.2, 38.6, 29.1, 33.8],
})

for k, v in df.set_index('Year')['Lat'].to_dict().items():
    b[a == k] = v
print(b)

# output:
# [[37.2 33.8 38.6]
#  [33.8 29.1 29.1]
#  [46.1 35.2 37.2]]
  • Related