Every biome classification in this document begins with two climate numbers: an annual mean temperature and an annual precipitation for a place. This chapter covers how to get those numbers. The get_climate() function takes one or more coordinate pairs and returns the climate values at each.
Before the function itself, a few words on where the climate data comes from. WorldClim, the source get_climate() draws on, is a real departure from how ecologists have traditionally obtained climate data, and the departure is worth understanding.
5.1 Where climate data comes from
For most of the history of ecology, climate data came from weather stations. Using it meant finding the right station, downloading its high-frequency records (often daily), and aggregating those records to the timescale the analysis needed, usually monthly or annual averages. This worked, but it had real limitations. Weather stations are unevenly distributed: dense in populated and agricultural regions, sparse in deserts, mountains, and the high latitudes. The periods of record vary from station to station, so two nearby stations might cover different decades. And for a remote location, or for an analysis spanning a large area, there might be no well-placed station at all. The data existed, but getting consistent climate values for an arbitrary point on Earth was genuinely hard.
WorldClim takes a different approach. Rather than serving station records directly, it interpolates them into continuous global surfaces. The current version, WorldClim 2.1, is built from the records of somewhere between 9,000 and 60,000 weather stations (the count varies by climate variable). Those station records, all covering the 1970–2000 period, are interpolated across space using a thin-plate spline method, with covariates that improve the interpolation where stations are sparse: elevation, distance to the coast, and satellite measurements of land surface temperature and cloud cover from the MODIS platform. The satellite covariates matter most in exactly the data-poor regions where station interpolation alone would struggle.
The result is a climate value for every cell of a global grid covering all terrestrial land surfaces. There is no finding the station. A coordinate anywhere on land returns a climate value, because the interpolation has already filled every cell. The periods are consistent: every cell represents the same 1970–2000 baseline. And the data is already aggregated to the annual and monthly summaries an ecological analysis usually wants, including the nineteen derived bioclimatic variables. This document uses two of them: BIO1, annual mean temperature, and BIO12, annual precipitation.
WorldClim distributes the data as GeoTIFF files. A GeoTIFF is a raster format: a rectangular grid of cells, each cell holding one value, with embedded coordinate metadata so the grid maps onto real geographic space. The get_climate() function handles the download and the cell lookup, so the GeoTIFF format is something the function manages rather than something you interact with directly.
The storage strategy changes with resolution. At 10, 5, and 2.5 arc-minutes, the whole world fits in a single GeoTIFF per variable, a few hundred megabytes at most. At 30 arc-seconds (roughly one kilometer at the equator) the global dataset is far larger. Rather than one enormous file, WorldClim splits the 30-arcsecond data into tiles, each covering a block of the globe, and you download only the tiles that cover your area of interest. The get_climate() function manages the distinction: at the coarser resolutions it fetches the single global grid; at 30 arc-seconds it fetches the tile or tiles your points fall within. Either way the download happens once and is cached. This document works mostly at 2.5 arc-minutes, the choice discussed in the Scale chapter, with one finer-resolution example to show what 30 arc-seconds reveals.
5.2 Setup
Every code-bearing chapter begins with a setup chunk. Run it first.
Show the code
## the whittakerr toolkit: get_climate(), name_biome(), plot_biomes()library(whittakerr)## read_csv() for reading inline data tableslibrary(readr)## data manipulation: bind_cols() and select()library(dplyr)## formatted tables for displaylibrary(gt)## suppress read_csv() column-type messages for the whole chapteroptions(readr.show_col_types =FALSE)
5.3 Retrieving climate for one location
The get_climate() function takes a longitude and a latitude and returns the climate at that point. Here it is for Honolulu:
Show the code
## retrieve historical climate for Honoluluhonolulu <-get_climate(lon =-157.86, lat =21.31)## show the resulthonolulu
The first call to get_climate() triggers the WorldClim download described above. At 2.5 arc-minute resolution this is a few hundred megabytes, and it takes a few minutes. The download happens once. Afterward the data is cached, and subsequent calls are fast.
The returned object is a table with one row per location. The columns are the longitude and latitude you supplied, then mat_c (mean annual temperature, in degrees Celsius), map_mm (mean annual precipitation, in millimeters), map_cm (the same precipitation in centimeters), and scenario (here, the historical 1970–2000 baseline). For Honolulu, expect a warm temperature near 24 degrees and a moderate precipitation: the coordinate falls on the drier, leeward side of Oahu.
Both precipitation columns are provided for convenience. WorldClim reports precipitation in millimeters; the Whittaker biome functions in this document expect centimeters. Having both means you can pass map_cm straight into name_biome() or plot_biomes() without converting anything yourself.
5.4 Several locations at once
get_climate() is vectorized. Pass it a vector of longitudes and a matching vector of latitudes, and it returns one row per location. The natural way to organize several locations is a table.
Here are three Pacific-coast cities with very different climates, written as an inline data table:
Show the code
## three Pacific-coast cities as an inline data tablecities <-read_csv("name, lon, lat Honolulu, -157.86, 21.31 Los Angeles, -118.24, 34.05 Seattle, -122.33, 47.61")## display the table with its data source notedcities |>gt() |>tab_source_note("Approximate city-center coordinates.")
name
lon
lat
Honolulu
-157.86
21.31
Los Angeles
-118.24
34.05
Seattle
-122.33
47.61
Approximate city-center coordinates.
Building the data as a table, rather than as separate vectors of names, longitudes, and latitudes, has a practical benefit. Each row is one location, and its name, longitude, and latitude stay together. Adding or removing a city is a one-line edit, and there’s no way for the values to drift out of alignment. Data input is where analysis errors most often begin; a table keeps each observation intact.
Now retrieve the climate for all three cities in a single call:
Show the code
## retrieve climate for all three cities in a single callcities_climate <-get_climate(lon = cities$lon, lat = cities$lat)
get_climate() returns its climate columns in the same row order as the coordinates it received, so the results line up with the cities table row for row. Combining the two takes two steps. The get_climate() result carries its own lon and lat columns, so binding the two tables whole would produce duplicates. Instead, first take just the climate columns wanted, then attach them to the cities table:
Show the code
## step 1: take just the climate columns from the get_climate() resultclimate_values <-select(cities_climate, mat_c, map_mm, map_cm)## step 2: attach those columns to the cities tablecities_result <-bind_cols(cities, climate_values)## display the combined table with the climate-data source notedcities_result |>gt() |>tab_source_note("Climate values: WorldClim 2.1, 1970–2000 baseline.")
The three cities span a wide climate range. Honolulu is warm and moderately wet. Los Angeles is mild and dry. Seattle is cool and wet. These differences are exactly what the Whittaker diagram organizes, and later chapters place these same cities on it.
5.5 A finer resolution
Every example so far has used the document’s standard resolution of 2.5 arc-minutes. The resolution argument changes that. Its finest setting, 30 arc-seconds, resolves local climate features that the coarser grid averages away. The Scale chapter makes this argument in general; here is the retrieval that shows it on a single point.
The point is on Oahu’s Waianae coast, the island’s dry leeward side. At 2.5 arc-minutes a single grid cell is roughly four and a half kilometers across, large enough to blend the dry coastal lowland with the wetter slopes of the Waianae Range just inland. Here is the climate at that point, at the standard resolution:
Show the code
## a point on Oahu's dry leeward (Waianae) coast, at the default 2.5 arc-minute resolutionleeward_coarse <-get_climate(lon =-158.17, lat =21.45)## show the resultleeward_coarse
Now the same coordinate at 30 arc-seconds. As the earlier section described, the 30-arcsecond data is distributed as tiles rather than one global grid, so the first call downloads the tile covering Oahu. Like the global grid, the tile is cached after the first fetch.
Show the code
## the same point at 30-arcsecond resolutionleeward_fine <-get_climate(lon =-158.17, lat =21.45, resolution =0.5)## show the resultleeward_fine
The two results are easier to read side by side. Label each one by its resolution, then stack them into a single table:
Show the code
## label each result by its resolutioncoarse_labeled <-mutate(leeward_coarse, resolution ="2.5 arc-minutes")fine_labeled <-mutate(leeward_fine, resolution ="30 arc-seconds")## stack the two into one tableleeward_compare <-bind_rows(coarse_labeled, fine_labeled)## display temperature and precipitation at the two resolutionsleeward_compare |>select(resolution, mat_c, map_cm) |>gt() |>tab_source_note("Climate values: WorldClim 2.1, 1970–2000 baseline.")
The finer grid should report the drier value. Its smaller cell isolates the leeward lowland, while the larger cell blends that lowland with wetter ground nearby. Whether the difference matters depends on the question being asked. For placing a city on the Whittaker diagram at continental scale, 2.5 arc-minutes is enough. For telling a dry pocket apart from its surroundings, 30 arc-seconds is the resolution that makes the pocket visible. The resolution argument is the single control over that tradeoff.
5.6 Future climate
Every retrieval so far has returned the historical baseline, the WorldClim 1970–2000 average. The scenario argument also reaches projected future climate. Setting scenario = "future" switches the source from the WorldClim historical surfaces to a projection from CMIP6, the Coupled Model Intercomparison Project.
A future projection is defined by three choices, and get_climate() exposes each as an argument. The gcm is the global climate model: a physics-based simulation of the atmosphere, ocean, and land. The ssp is the Shared Socioeconomic Pathway: a scenario for how population, economy, and emissions develop through the century. The period is the time window the projection averages over. The defaults used here are one mid-sensitivity model (MPI-ESM1-2-HR), the middle-of-the-road SSP2-4.5 pathway, and the 2041–2060 mid-century window.
How the models and the pathways differ from one another is a substantial subject, well beyond the scope of this chapter. The model-comparison framework is described by Eyring et al. (2016) in Geoscientific Model Development; the socioeconomic pathways are described by Riahi et al. (2017) in Global Environmental Change. The IPCC Sixth Assessment Report (Working Group I, 2021) synthesizes both for a general scientific readership. A careful analysis usually retrieves several models and pathways rather than one, and treats the spread among them as a measure of the projection’s uncertainty. The single default projection used here is enough to show the mechanics.
Retrieve the future climate for the same three cities:
Show the code
## retrieve a future projection for the three cities## (defaults: model MPI-ESM1-2-HR, pathway SSP2-4.5, period 2041-2060)cities_future <-get_climate(lon = cities$lon, lat = cities$lat,scenario ="future")## show the resultcities_future
The returned table carries three columns the historical results did not: gcm, ssp, and period, recording which projection produced the values. As with the historical grid, the CMIP6 data downloads on the first call and is cached afterward.
Set the future values next to the historical ones, one row per city:
Show the code
## historical and future climate side by sideclimate_change <-tibble(city = cities$name,temp_historical = cities_climate$mat_c,temp_future = cities_future$mat_c,precip_historical = cities_climate$map_cm,precip_future = cities_future$map_cm)## display the comparison, with a unit footnote on each set of## columns whose units are not carried in the column nameclimate_change |>gt() |>tab_footnote(footnote ="Degrees Celsius.",locations =cells_column_labels(columns =c(temp_historical, temp_future)) ) |>tab_footnote(footnote ="Centimeters.",locations =cells_column_labels(columns =c(precip_historical, precip_future)) ) |>tab_source_note("Historical: WorldClim 2.1, 1970–2000. Future: CMIP6, MPI-ESM1-2-HR, SSP2-4.5, 2041–2060.")
Each city’s projected mid-century temperature is warmer than its 1970–2000 baseline. The size of the increase varies with the location, and would vary again with a different model or pathway. Precipitation changes are smaller and less consistent in direction: precipitation is the harder of the two variables to project, and a warming climate does not move every place the same way. The Whittaker diagram makes a shift like this easy to see. A warming city moves to the right along the temperature axis, and a later chapter uses exactly that motion to show how a place can move toward the edge of its biome.
5.7 A larger example: the botanical gardens
The examples so far have used a few locations typed in by hand. get_climate() works just as well on a large set of coordinates read from a file. The whittakerr package includes a table of botanical garden locations, and this section uses the California gardens from it to retrieve climate at scale.
The table ships with the package. Find it with system.file() and read it:
Show the code
## locate the botanical-gardens file bundled with whittakerrgardens_file <-system.file("extdata", "Bot_Garden_Geocode_CSV.csv",package ="whittakerr")## the gardens CSV is in Windows-1252, a legacy Windows## encoding; read_csv() assumes UTF-8 unless told otherwisegardens <-read_csv(gardens_file,locale =locale(encoding ="Windows-1252"))
Each row is one garden, with its name, city, state, and coordinates. Keep just the California gardens:
Show the code
## keep only the California gardensca_gardens <-filter(gardens, State =="CA")
The California subset holds 65 gardens, enough that entering them by hand would be slow and error-prone. Retrieving climate for all of them is still a single call:
Show the code
## retrieve climate for every California garden in one callca_climate <-get_climate(lon = ca_gardens$lon, lat = ca_gardens$lat)
The 2.5 arc-minute historical grid is already cached from the earlier examples, so this call downloads nothing. It is a few dozen cell lookups and returns at once. Combine the climate values with the garden names, using the same two-step pattern as the cities example:
Show the code
## step 1: take just the climate columnsgarden_values <-select(ca_climate, mat_c, map_cm)## step 2: attach them to the garden names and citiesca_result <-bind_cols(select(ca_gardens, Name, City), garden_values)## show the first eight gardensca_result |>slice_head(n =8) |>gt() |>tab_source_note("Climate values: WorldClim 2.1, 1970–2000 baseline.")
Name
City
mat_c
map_cm
Los Angeles County Arboretum and Botanic Garden
Arcadia
18.42833
43.1
Regional Parks Botanic Garden
Berkeley
14.94317
64.2
Marin-Bolinas Botanical Gardens
Bolinas
13.76615
108.1
Harland Hand Memorial Garden
El Cerrito
15.09635
69.1
San Diego Botanic Garden (formerly Quail Botanical Gardens)
The full California set spans a wide climate range, from the cool, wet gardens of the far north coast to the hot, dry gardens of the southern deserts. That spread is the point. One get_climate() call has placed every garden on the same two climate axes, ready for the Whittaker diagram. The Basic Whittaker Diagrams chapter takes this same California result and plots all of the gardens together, and a later chapter uses them to ask which gardens sit closest to a biome boundary, and so face the most change as the climate shifts.