I have gridded daily temperature data but am only interested in the winter months.
from netCDF4 import Dataset as netcdf_dataset
import numpy as np
import xarray as xr
#open NASA GISS gridded temperature netcdf file
df = xr.open_dataset('BerkeleyEarth.nc')
#pull out temperature variable
air=df.temperature
#select only winter months
WinterAir = air[(air.time.dt.month >= 12) | (air.time.dt.month <= 2)]
When I try to select the months this way I get the following error message: AttributeError: 'DataArray' object has no attribute 'month'. How do I select only winter months?
Here is a screenshot of the netcdf file
CodePudding user response:
I was able to do this by:
# select only winter months
WinterAir = air[(df.month >= 12) | (df.month <= 2)]
CodePudding user response:
The reason this doesn't work for your data specifically is that you don't have a datetime
coordinate time
; instead, you have a dimension time
without any coordinate data labeling it, and then you have data variables with a variety of date components. Because of this, you can reference the month
data variable directly and use that to slice your data.
You could always construct a datetime coordinate using the day, month, and year values in your data and assign that as the time
coordinate, and then the usual time series functionality built into xarray would work.
As an example, here's a dataset similar to yours in structure:
In [6]: dates = pd.date_range("2020-01-01", "2020-12-31", freq="D")
...:
...: ds = xr.Dataset(
...: coords={"lon": [-135, -45, 45, 135], "lat": [-45, 45]},
...: data_vars={
...: "day": (("time",), dates.day),
...: "month": (("time",), dates.month),
...: "year": (("time",), dates.year),
...: "temperature": (
...: ("lat", "lon", "time"),
...: np.random.random(size=(2, 4, len(dates))),
...: ),
...: },
...: )
In [7]: ds
Out[7]:
<xarray.Dataset>
Dimensions: (time: 366, lat: 2, lon: 4)
Coordinates:
* lon (lon) int64 -135 -45 45 135
* lat (lat) int64 -45 45
Dimensions without coordinates: time
Data variables:
day (time) int64 1 2 3 4 5 6 7 8 9 ... 23 24 25 26 27 28 29 30 31
month (time) int64 1 1 1 1 1 1 1 1 1 1 ... 12 12 12 12 12 12 12 12 12
year (time) int64 2020 2020 2020 2020 2020 ... 2020 2020 2020 2020
temperature (lat, lon, time) float64 0.2308 0.3257 ... 0.3501 0.009162
Note that time
is a special "dimension without coordinates" - this means that there are no labels on the time dimension, and xarray does not know anything about "time" except that it has a certain shape and is the dimension indexing several of your data variables. Importantly, in your data, time
is not a datetime type.
Because month
is a data variable in the dataset, you need to reference month directly, as you found, and the DatetimeAccessor ds.time.dt
is not available:
In [8]: ds.loc[{"time": ds.month == 2}]
Out[8]:
<xarray.Dataset>
Dimensions: (time: 29, lat: 2, lon: 4)
Coordinates:
* lon (lon) int64 -135 -45 45 135
* lat (lat) int64 -45 45
Dimensions without coordinates: time
Data variables:
day (time) int64 1 2 3 4 5 6 7 8 9 ... 21 22 23 24 25 26 27 28 29
month (time) int64 2 2 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2 2
year (time) int64 2020 2020 2020 2020 2020 ... 2020 2020 2020 2020
temperature (lat, lon, time) float64 0.2821 0.08776 0.2018 ... 0.929 0.4774
If the time
dimension had a corresponding coordinate of type datetime
, e.g. by assigning the previous dates
array to the time
coord, everything would work as you expect:
In [10]: dates = pd.date_range("2020-01-01", "2020-12-31", freq="D")
...:
...: ds = xr.Dataset(
...: coords={"lon": [-135, -45, 45, 135], "lat": [-45, 45], "time": dates},
...: data_vars={
...: "temperature": (
...: ("lat", "lon", "time"),
...: np.random.random(size=(2, 4, len(dates))),
...: ),
...: },
...: )
In [11]: ds
Out[11]:
<xarray.Dataset>
Dimensions: (lat: 2, lon: 4, time: 366)
Coordinates:
* lon (lon) int64 -135 -45 45 135
* lat (lat) int64 -45 45
* time (time) datetime64[ns] 2020-01-01 2020-01-02 ... 2020-12-31
Data variables:
temperature (lat, lon, time) float64 0.09064 0.5252 ... 0.08733 0.6283
Now the xarray datetime accessors work the way you'd expect
In [12]: ds.loc[{"time": ds.time.dt.month == 2}]
Out[12]:
<xarray.Dataset>
Dimensions: (lat: 2, lon: 4, time: 29)
Coordinates:
* lon (lon) int64 -135 -45 45 135
* lat (lat) int64 -45 45
* time (time) datetime64[ns] 2020-02-01 2020-02-02 ... 2020-02-29
Data variables:
temperature (lat, lon, time) float64 0.3407 0.6847 0.3073 ... 0.8578 0.1335
See xarray's docs on Coordinates and working with time series data for more info.