5 Retrieving Climate Data

Every biome classification in this document begins with two climate numbers: an annual mean temperature and an annual precipitation for a place. This chapter covers how to get those numbers. The get_climate() function takes one or more coordinate pairs and returns the climate values at each.

Before the function itself, a few words on where the climate data comes from. WorldClim, the source get_climate() draws on, is a real departure from how ecologists have traditionally obtained climate data, and the departure is worth understanding.

5.1 Where climate data comes from

For most of the history of ecology, climate data came from weather stations. Using it meant finding the right station, downloading its high-frequency records (often daily), and aggregating those records to the timescale the analysis needed, usually monthly or annual averages. This worked, but it had real limitations. Weather stations are unevenly distributed: dense in populated and agricultural regions, sparse in deserts, mountains, and the high latitudes. The periods of record vary from station to station, so two nearby stations might cover different decades. And for a remote location, or for an analysis spanning a large area, there might be no well-placed station at all. The data existed, but getting consistent climate values for an arbitrary point on Earth was genuinely hard.

WorldClim takes a different approach. Rather than serving station records directly, it interpolates them into continuous global surfaces. The current version, WorldClim 2.1, is built from the records of somewhere between 9,000 and 60,000 weather stations (the count varies by climate variable). Those station records, all covering the 1970–2000 period, are interpolated across space using a thin-plate spline method, with covariates that improve the interpolation where stations are sparse: elevation, distance to the coast, and satellite measurements of land surface temperature and cloud cover from the MODIS platform. The satellite covariates matter most in exactly the data-poor regions where station interpolation alone would struggle.

The result is a climate value for every cell of a global grid covering all terrestrial land surfaces. There is no finding the station. A coordinate anywhere on land returns a climate value, because the interpolation has already filled every cell. The periods are consistent: every cell represents the same 1970–2000 baseline. And the data is already aggregated to the annual and monthly summaries an ecological analysis usually wants, including the nineteen derived bioclimatic variables. This document uses two of them: BIO1, annual mean temperature, and BIO12, annual precipitation.

WorldClim distributes the data as GeoTIFF files. A GeoTIFF is a raster format: a rectangular grid of cells, each cell holding one value, with embedded coordinate metadata so the grid maps onto real geographic space. The get_climate() function handles the download and the cell lookup, so the GeoTIFF format is something the function manages rather than something you interact with directly.

The storage strategy changes with resolution. At 10, 5, and 2.5 arc-minutes, the whole world fits in a single GeoTIFF per variable, a few hundred megabytes at most. At 30 arc-seconds (roughly one kilometer at the equator) the global dataset is far larger. Rather than one enormous file, WorldClim splits the 30-arcsecond data into tiles, each covering a block of the globe, and you download only the tiles that cover your area of interest. The get_climate() function manages the distinction: at the coarser resolutions it fetches the single global grid; at 30 arc-seconds it fetches the tile or tiles your points fall within. Either way the download happens once and is cached. This document works mostly at 2.5 arc-minutes, the choice discussed in the Scale chapter, with one finer-resolution example to show what 30 arc-seconds reveals.

5.2 Setup

Every code-bearing chapter begins with a setup chunk. Run it first.

Show the code

## the whittakerr toolkit: get_climate(), name_biome(), plot_biomes()
library(whittakerr)
## read_csv() for reading inline data tables
library(readr)
## data manipulation: bind_cols() and select()
library(dplyr)
## formatted tables for display
library(gt)
## suppress read_csv() column-type messages for the whole chapter
options(readr.show_col_types = FALSE)

5.3 Retrieving climate for one location

The get_climate() function takes a longitude and a latitude and returns the climate at that point. Here it is for Honolulu:

Show the code

## retrieve historical climate for Honolulu
honolulu <- get_climate(lon = -157.86, lat = 21.31)
## show the result
honolulu

# A tibble: 1 × 6
    lon   lat mat_c map_mm map_cm scenario  
  <dbl> <dbl> <dbl>  <dbl>  <dbl> <chr>     
1 -158.  21.3  24.7   1253   125. historical

The first call to get_climate() triggers the WorldClim download described above. At 2.5 arc-minute resolution this is a few hundred megabytes, and it takes a few minutes. The download happens once. Afterward the data is cached, and subsequent calls are fast.

The returned object is a table with one row per location. The columns are the longitude and latitude you supplied, then mat_c (mean annual temperature, in degrees Celsius), map_mm (mean annual precipitation, in millimeters), map_cm (the same precipitation in centimeters), and scenario (here, the historical 1970–2000 baseline). For Honolulu, expect a warm temperature near 24 degrees and a moderate precipitation: the coordinate falls on the drier, leeward side of Oahu.

Both precipitation columns are provided for convenience. WorldClim reports precipitation in millimeters; the Whittaker biome functions in this document expect centimeters. Having both means you can pass map_cm straight into name_biome() or plot_biomes() without converting anything yourself.

5.4 Several locations at once

get_climate() is vectorized. Pass it a vector of longitudes and a matching vector of latitudes, and it returns one row per location. The natural way to organize several locations is a table.

Here are three Pacific-coast cities with very different climates, written as an inline data table:

Show the code

## three Pacific-coast cities as an inline data table
cities <- read_csv(
  "name,         lon,      lat
   Honolulu,     -157.86,  21.31
   Los Angeles,  -118.24,  34.05
   Seattle,      -122.33,  47.61")

## display the table with its data source noted
cities |>
  gt() |>
  tab_source_note("Approximate city-center coordinates.")

name	lon	lat
Honolulu	-157.86	21.31
Los Angeles	-118.24	34.05
Seattle	-122.33	47.61
Approximate city-center coordinates.

Building the data as a table, rather than as separate vectors of names, longitudes, and latitudes, has a practical benefit. Each row is one location, and its name, longitude, and latitude stay together. Adding or removing a city is a one-line edit, and there’s no way for the values to drift out of alignment. Data input is where analysis errors most often begin; a table keeps each observation intact.

Now retrieve the climate for all three cities in a single call:

Show the code

## retrieve climate for all three cities in a single call
cities_climate <- get_climate(lon = cities$lon, lat = cities$lat)

get_climate() returns its climate columns in the same row order as the coordinates it received, so the results line up with the cities table row for row. Combining the two takes two steps. The get_climate() result carries its own lon and lat columns, so binding the two tables whole would produce duplicates. Instead, first take just the climate columns wanted, then attach them to the cities table:

Show the code

## step 1: take just the climate columns from the get_climate() result
climate_values <- select(cities_climate, mat_c, map_mm, map_cm)

## step 2: attach those columns to the cities table
cities_result <- bind_cols(cities, climate_values)

## display the combined table with the climate-data source noted
cities_result |>
  gt() |>
  tab_source_note("Climate values: WorldClim 2.1, 1970–2000 baseline.")

name	lon	lat	mat_c	map_mm	map_cm
Honolulu	-157.86	21.31	24.70955	1253	125.3
Los Angeles	-118.24	34.05	18.82033	403	40.3
Seattle	-122.33	47.61	10.99050	1012	101.2
Climate values: WorldClim 2.1, 1970–2000 baseline.

The three cities span a wide climate range. Honolulu is warm and moderately wet. Los Angeles is mild and dry. Seattle is cool and wet. These differences are exactly what the Whittaker diagram organizes, and later chapters place these same cities on it.

5.5 A finer resolution

Every example so far has used the document’s standard resolution of 2.5 arc-minutes. The resolution argument changes that. Its finest setting, 30 arc-seconds, resolves local climate features that the coarser grid averages away. The Scale chapter makes this argument in general; here is the retrieval that shows it on a single point.

The point is on Oahu’s Waianae coast, the island’s dry leeward side. At 2.5 arc-minutes a single grid cell is roughly four and a half kilometers across, large enough to blend the dry coastal lowland with the wetter slopes of the Waianae Range just inland. Here is the climate at that point, at the standard resolution:

Show the code

## a point on Oahu's dry leeward (Waianae) coast, at the default 2.5 arc-minute resolution
leeward_coarse <- get_climate(lon = -158.17, lat = 21.45)
## show the result
leeward_coarse

# A tibble: 1 × 6
    lon   lat mat_c map_mm map_cm scenario  
  <dbl> <dbl> <dbl>  <dbl>  <dbl> <chr>     
1 -158.  21.4  24.0    812   81.2 historical

Now the same coordinate at 30 arc-seconds. As the earlier section described, the 30-arcsecond data is distributed as tiles rather than one global grid, so the first call downloads the tile covering Oahu. Like the global grid, the tile is cached after the first fetch.

Show the code

## the same point at 30-arcsecond resolution
leeward_fine <- get_climate(lon = -158.17, lat = 21.45, resolution = 0.5)
## show the result
leeward_fine

# A tibble: 1 × 6
    lon   lat mat_c map_mm map_cm scenario  
  <dbl> <dbl> <dbl>  <dbl>  <dbl> <chr>     
1 -158.  21.4  23.8    823   82.3 historical

The two results are easier to read side by side. Label each one by its resolution, then stack them into a single table:

Show the code

## label each result by its resolution
coarse_labeled <- mutate(leeward_coarse, resolution = "2.5 arc-minutes")
fine_labeled   <- mutate(leeward_fine,   resolution = "30 arc-seconds")

## stack the two into one table
leeward_compare <- bind_rows(coarse_labeled, fine_labeled)

## display temperature and precipitation at the two resolutions
leeward_compare |>
  select(resolution, mat_c, map_cm) |>
  gt() |>
  tab_source_note("Climate values: WorldClim 2.1, 1970–2000 baseline.")

resolution	mat_c	map_cm
2.5 arc-minutes	23.96122	81.2
30 arc-seconds	23.78750	82.3
Climate values: WorldClim 2.1, 1970–2000 baseline.

The finer grid should report the drier value. Its smaller cell isolates the leeward lowland, while the larger cell blends that lowland with wetter ground nearby. Whether the difference matters depends on the question being asked. For placing a city on the Whittaker diagram at continental scale, 2.5 arc-minutes is enough. For telling a dry pocket apart from its surroundings, 30 arc-seconds is the resolution that makes the pocket visible. The resolution argument is the single control over that tradeoff.

5.6 Future climate

Every retrieval so far has returned the historical baseline, the WorldClim 1970–2000 average. The scenario argument also reaches projected future climate. Setting scenario = "future" switches the source from the WorldClim historical surfaces to a projection from CMIP6, the Coupled Model Intercomparison Project.

A future projection is defined by three choices, and get_climate() exposes each as an argument. The gcm is the global climate model: a physics-based simulation of the atmosphere, ocean, and land. The ssp is the Shared Socioeconomic Pathway: a scenario for how population, economy, and emissions develop through the century. The period is the time window the projection averages over. The defaults used here are one mid-sensitivity model (MPI-ESM1-2-HR), the middle-of-the-road SSP2-4.5 pathway, and the 2041–2060 mid-century window.

How the models and the pathways differ from one another is a substantial subject, well beyond the scope of this chapter. The model-comparison framework is described by Eyring et al. (2016) in Geoscientific Model Development; the socioeconomic pathways are described by Riahi et al. (2017) in Global Environmental Change. The IPCC Sixth Assessment Report (Working Group I, 2021) synthesizes both for a general scientific readership. A careful analysis usually retrieves several models and pathways rather than one, and treats the spread among them as a measure of the projection’s uncertainty. The single default projection used here is enough to show the mechanics.

Retrieve the future climate for the same three cities:

Show the code

## retrieve a future projection for the three cities
## (defaults: model MPI-ESM1-2-HR, pathway SSP2-4.5, period 2041-2060)
cities_future <- get_climate(lon = cities$lon, lat = cities$lat,
                             scenario = "future")
## show the result
cities_future

# A tibble: 3 × 9
    lon   lat mat_c map_mm map_cm scenario gcm           ssp   period   
  <dbl> <dbl> <dbl>  <dbl>  <dbl> <chr>    <chr>         <chr> <chr>    
1 -158.  21.3  25.6   1115  112.  future   MPI-ESM1-2-HR 245   2041-2060
2 -118.  34.0  20.2    391   39.1 future   MPI-ESM1-2-HR 245   2041-2060
3 -122.  47.6  12.4   1015  102.  future   MPI-ESM1-2-HR 245   2041-2060

The returned table carries three columns the historical results did not: gcm, ssp, and period, recording which projection produced the values. As with the historical grid, the CMIP6 data downloads on the first call and is cached afterward.

Set the future values next to the historical ones, one row per city:

Show the code

## historical and future climate side by side
climate_change <- tibble(
  city              = cities$name,
  temp_historical   = cities_climate$mat_c,
  temp_future       = cities_future$mat_c,
  precip_historical = cities_climate$map_cm,
  precip_future     = cities_future$map_cm
)

## display the comparison, with a unit footnote on each set of
## columns whose units are not carried in the column name
climate_change |>
  gt() |>
  tab_footnote(
    footnote  = "Degrees Celsius.",
    locations = cells_column_labels(columns = c(temp_historical, temp_future))
  ) |>
  tab_footnote(
    footnote  = "Centimeters.",
    locations = cells_column_labels(columns = c(precip_historical, precip_future))
  ) |>
  tab_source_note(
    "Historical: WorldClim 2.1, 1970–2000. Future: CMIP6, MPI-ESM1-2-HR, SSP2-4.5, 2041–2060.")

city	temp_historical¹	temp_future¹	precip_historical²	precip_future²
Honolulu	24.70955	25.6	125.3	111.5
Los Angeles	18.82033	20.2	40.3	39.1
Seattle	10.99050	12.4	101.2	101.5
¹ Degrees Celsius.
² Centimeters.
Historical: WorldClim 2.1, 1970–2000. Future: CMIP6, MPI-ESM1-2-HR, SSP2-4.5, 2041–2060.

Each city’s projected mid-century temperature is warmer than its 1970–2000 baseline. The size of the increase varies with the location, and would vary again with a different model or pathway. Precipitation changes are smaller and less consistent in direction: precipitation is the harder of the two variables to project, and a warming climate does not move every place the same way. The Whittaker diagram makes a shift like this easy to see. A warming city moves to the right along the temperature axis, and a later chapter uses exactly that motion to show how a place can move toward the edge of its biome.

5.7 A larger example: the botanical gardens

The examples so far have used a few locations typed in by hand. get_climate() works just as well on a large set of coordinates read from a file. The whittakerr package includes a table of botanical garden locations, and this section uses the California gardens from it to retrieve climate at scale.

The table ships with the package. Find it with system.file() and read it:

Show the code

## locate the botanical-gardens file bundled with whittakerr
gardens_file <- system.file("extdata", "Bot_Garden_Geocode_CSV.csv",
                            package = "whittakerr")

## the gardens CSV is in Windows-1252, a legacy Windows
## encoding; read_csv() assumes UTF-8 unless told otherwise
gardens <- read_csv(gardens_file,
                    locale = locale(encoding = "Windows-1252"))

Each row is one garden, with its name, city, state, and coordinates. Keep just the California gardens:

Show the code

## keep only the California gardens
ca_gardens <- filter(gardens, State == "CA")

The California subset holds 65 gardens, enough that entering them by hand would be slow and error-prone. Retrieving climate for all of them is still a single call:

Show the code

## retrieve climate for every California garden in one call
ca_climate <- get_climate(lon = ca_gardens$lon, lat = ca_gardens$lat)

The 2.5 arc-minute historical grid is already cached from the earlier examples, so this call downloads nothing. It is a few dozen cell lookups and returns at once. Combine the climate values with the garden names, using the same two-step pattern as the cities example:

Show the code

## step 1: take just the climate columns
garden_values <- select(ca_climate, mat_c, map_cm)

## step 2: attach them to the garden names and cities
ca_result <- bind_cols(select(ca_gardens, Name, City), garden_values)

## show the first eight gardens
ca_result |>
  slice_head(n = 8) |>
  gt() |>
  tab_source_note("Climate values: WorldClim 2.1, 1970–2000 baseline.")

Name	City	mat_c	map_cm
Los Angeles County Arboretum and Botanic Garden	Arcadia	18.42833	43.1
Regional Parks Botanic Garden	Berkeley	14.94317	64.2
Marin-Bolinas Botanical Gardens	Bolinas	13.76615	108.1
Harland Hand Memorial Garden	El Cerrito	15.09635	69.1
San Diego Botanic Garden (formerly Quail Botanical Gardens)	Encinitas	16.26250	29.3
Fullerton Arboretum	Fullerton	18.42517	36.5
University of California Irvine Arboretum	Irvine	17.78083	32.8
Blake Garden	Kensington	14.59500	73.7
Climate values: WorldClim 2.1, 1970–2000 baseline.

The full California set spans a wide climate range, from the cool, wet gardens of the far north coast to the hot, dry gardens of the southern deserts. That spread is the point. One get_climate() call has placed every garden on the same two climate axes, ready for the Whittaker diagram. The Basic Whittaker Diagrams chapter takes this same California result and plots all of the gardens together, and a later chapter uses them to ask which gardens sit closest to a biome boundary, and so face the most change as the climate shifts.