I have a numpy array with thousands of rows and columns, and I'm wondering how to update each value based on the values in a pandas dataframe.
For example, let's say my array contains a list of years (here's an incredibly small sample just to give you the basic idea):
[[2020, 2015, 2017],
[2015, 2016, 2016],
[2019, 2018, 2020]]
I want to change each value in the array to "Lat" based on the "Year". So if my pandas dataframe looks like this:
Year | Lat | Lon |
---|---|---|
2020 | 37.2 | 103.45 |
2019 | 46.1 | 107.82 |
2018 | 35.2 | 101.45 |
2017 | 38.6 | 110.62 |
2016 | 29.1 | 112.73 |
2015 | 33.8 | 120.92 |
Then the output array should look like:
[[37.2, 33.8, 38.6],
[33.8, 29.1, 29.1],
[46.1, 35.2, 37.2]]
If my dataset were truly this small, it wouldn't be a problem, but considering I have millions of values in the array and thousands of values in the dataframe, I'm a little overwhelmed on how to go about this efficiently. Any suggestions would be greatly appreciated!
CodePudding user response:
Maybe setting Year
as index and using loc
would help
# Data
arr = np.array([[2020, 2015, 2017], [2015, 2016, 2016], [2019, 2018, 2020]])
df = pd.DataFrame({'Year': {0: 2020, 1: 2019, 2: 2018, 3: 2017, 4: 2016, 5: 2015},
'Lat': {0: 37.2, 1: 46.1, 2: 35.2, 3: 38.6, 4: 29.1, 5: 33.8},
'Lon': {0: 103.45, 1: 107.82, 2: 101.45, 3: 110.62, 4: 112.73, 5: 120.92}})
df = df.set_index("Year")
np.array([[df.loc[year, "Lat"] for year in years] for years in arr])
# array([[37.2, 33.8, 38.6],
# [33.8, 29.1, 29.1],
# [46.1, 35.2, 37.2]])
CodePudding user response:
You're basically mapping values across columns. One idea is to use indexing to locate the elements that need to be replaced for a given key, then replace them all at once. This takes one iteration for each key-value pair in the original data.
Example:
import numpy as np
import pandas as pd
a = np.array([
[2020, 2015, 2017],
[2015, 2016, 2016],
[2019, 2018, 2020],
])
b = np.zeros(a.shape, dtype=float)
df = pd.DataFrame({
'Year': [2020, 2019, 2018, 2017, 2016, 2015],
'Lat': [37.2, 46.1, 35.2, 38.6, 29.1, 33.8],
})
for k, v in df.set_index('Year')['Lat'].to_dict().items():
b[a == k] = v
print(b)
# output:
# [[37.2 33.8 38.6]
# [33.8 29.1 29.1]
# [46.1 35.2 37.2]]