Getting started
First load the devtools
package, used for installing teamlucc
. Install the
devtools
package if it is not already installed:
Now load the teamlucc
package, using devtools
to install it from github if
it is not yet installed:
Also load the rgdal
package needed for reading/writing shapefiles:
Collect training data for supervised classification
The first step in the classification is putting together a training dataset.
teamlucc
includes a function to output a shapefile that can be used for
collecting training data. Here we are collecting training data for the
L5TSR_1986
raster (a portion of a 1986 Landsat 5 surface reflectance image)
that is included with the teamlucc
package. Use the get_extent_polys
function to quickly construct a shapefile in the same coordinate system as the
image:
Add an empty field named “class_1986” to the object, and delete the extent polygon (because we don’t need it, and just want an empty shapefile):
Now save the train_polys
object to a shapefile using writeOGR
from the
rgdal
package. The "."
below just means “save the shapefile in the current
directory”.
Open the generated “training_data.shp” shapefile in a GIS program (I recommend
QGIS) and digitize a number of polygons in each of the
land cover classes you want to map. For this example, we will simply classify
“Forest” and “Non-forest”. For each polygon you digitize, record the cover type
in the “class_1986” column. After digitizing a number of polygons within each
class, save the shapefile, and load it back into R using
train_polys <- readOGR(".", "training_data")
.
Or: (for this example) you can use the thirty training polygons included in the
teamlucc
package in the L5TSR_1986_2001_training
dataset:
Classify image
First we need to extract the training data from our training image,
for each pixel within the polygons in our train_polys
dataset.
get_pixels
will use the training
parameter that we pass to
determine the fraction of the training data to use in training the classifier.
If set to 1, ALL of the training data will be used to train the classifier,
leaving no independent data for validation. If set to a fraction (for example
.6), then only 60% of the data (randomly selected) will be used in training,
and 40% will be preserved as an independent sample for use in testing.
Note: Validation data should generally be collected separately from training
data anyways, to ensure the image is randomly sampled (training data collection
is almost never random), so in most cases I don’t recommend making heavy use of
the training
parameter. It can be useful though in testing.
A summary method is provided by teamlucc
for printing summary statistics on
training datasets:
To perform the actual image classification, we will use the classify
function. Prior to using that function, we need to train a classifier. The
train_classifier
function automates training a random forest or support
vector machine (SVM) classifier. There are many options that can be provided to
train_classifier
- for this example we will just use the defaults. The
default is to use a random forest classifier.
Now we can use the classify
function to perform the image classification:
To see the predicted classes, use spplot
:
We can also see the class probabilities (per pixel probabilities of membership of each class):
The output from classify
also includes a table indicating the coding for the
output:
Parallel processing
Training a classifier and predicting land cover classes is very CPU-intensive.
If you have a machine that has multiple processors (or multiple cores), using
more than one processor can significantly increase the speed of some
calculations. teamlucc
supports parallel computations (using the capabilities
of the raster
package). To enable this functionality, first install the
doParallel
package if it is not already installed, and load the package:
Now, just call registerDoParallel()
, and by default any calculations that are
coded to run in parallel will use half of the available CPUs on your machine.
You can also specify a number of CPUs to use, by running, for example,
registerDoParallel(2)
to use two CPUs. The get_pixels
, train_classifier
and classify
functions in teamlucc
all support parallel computation, and
will run in parallel automatically if you have called registerDoParallel
.
Below is the code for the same classification problem we just ran, but this
time we run the classification in parallel:
Accuracy assessment
Conducting a thorough accuracy assessment is one of the most important
components of image classification. The teamlucc
package includes an
accuracy
function to assist with measuring the accuracy of image
classifications. In addition to the standard contingency tables often used for
describing accuracy, accuracy
also calculates “quantity disagreement” and
“allocation disagreement” as introduced by Pontius and Millones 20111.
Unbiased contingency tables can be calculated with accuracy
by supplying a
pop
parameter to accuracy
. accuracy
provides 95% confidence intervals for
user’s, producer’s, and overall accuracies, calculated as in Olofsson et al.
20132.
To calculate a basic contingency table, assuming that population frequencies of
the observed classes can be estimated from the classification output, and using
the 40% of pixels that were excluded from training the classifier as testing
data, run the accuracy
function using the model calculated above:
Note the warning from accuracy
, which is reminding us that we did not provide
population frequencies for the classes.
Asummary
method for the accuracy
object is provided by teamlucc
, and
calculates user’s, producers, and overall accuracy, and quantity and allocation
disagreement:
-
Pontius, R. G., and M. Millones. 2011. Death to Kappa: birth of quantity disagreement and allocation disagreement for accuracy assessment. International Journal of Remote Sensing 32:4407-4429. ↩
-
Olofsson, P., G. M. Foody, S. V. Stehman, and C. E. Woodcock. 2013. Making better use of accuracy data in land change studies: Estimating accuracy and area and quantifying uncertainty using stratified estimation. Remote Sensing of Environment 129:122-131. ↩