First load the
devtools package, used for installing
teamlucc. Install the
devtools package if it is not already installed:
Now load the
teamlucc package, using
devtools to install it from github if
it is not yet installed:
Also load the
rgdal package needed for reading/writing shapefiles:
Collect training data for supervised classification
The first step in the classification is putting together a training dataset.
teamlucc includes a function to output a shapefile that can be used for
collecting training data. Here we are collecting training data for the
L5TSR_1986 raster (a portion of a 1986 Landsat 5 surface reflectance image)
that is included with the
teamlucc package. Use the
function to quickly construct a shapefile in the same coordinate system as the
Add an empty field named “class_1986” to the object, and delete the extent polygon (because we don’t need it, and just want an empty shapefile):
Now save the
train_polys object to a shapefile using
writeOGR from the
rgdal package. The
"." below just means “save the shapefile in the current
Open the generated “training_data.shp” shapefile in a GIS program (I recommend
QGIS) and digitize a number of polygons in each of the
land cover classes you want to map. For this example, we will simply classify
“Forest” and “Non-forest”. For each polygon you digitize, record the cover type
in the “class_1986” column. After digitizing a number of polygons within each
class, save the shapefile, and load it back into R using
train_polys <- readOGR(".", "training_data").
Or: (for this example) you can use the thirty training polygons included in the
teamlucc package in the
First we need to extract the training data from our training image,
for each pixel within the polygons in our
get_pixels will use the
training parameter that we pass to
determine the fraction of the training data to use in training the classifier.
If set to 1, ALL of the training data will be used to train the classifier,
leaving no independent data for validation. If set to a fraction (for example
.6), then only 60% of the data (randomly selected) will be used in training,
and 40% will be preserved as an independent sample for use in testing.
Note: Validation data should generally be collected separately from training
data anyways, to ensure the image is randomly sampled (training data collection
is almost never random), so in most cases I don’t recommend making heavy use of
training parameter. It can be useful though in testing.
A summary method is provided by
teamlucc for printing summary statistics on
To perform the actual image classification, we will use the
function. Prior to using that function, we need to train a classifier. The
train_classifier function automates training a random forest or support
vector machine (SVM) classifier. There are many options that can be provided to
train_classifier - for this example we will just use the defaults. The
default is to use a random forest classifier.
Now we can use the
classify function to perform the image classification:
To see the predicted classes, use
We can also see the class probabilities (per pixel probabilities of membership of each class):
The output from
classify also includes a table indicating the coding for the
Training a classifier and predicting land cover classes is very CPU-intensive.
If you have a machine that has multiple processors (or multiple cores), using more than one processor can significantly increase the speed of some calculations.
teamlucc supports parallel computations (using the capabilities
raster package). To enable this functionality, first install the
doParallel package if it is not already installed, and load the package:
Now, just call
registerDoParallel(), and by default any calculations that are
coded to run in parallel will use half of the available CPUs on your machine.
You can also specify a number of CPUs to use, by running, for example,
registerDoParallel(2) to use two CPUs. The
classify functions in
teamlucc all support parallel computation, and
will run in parallel automatically if you have called
Below is the code for the same classification problem we just ran, but this time we run the classification in parallel:
Conducting a thorough accuracy assessment is one of the most important
components of image classification. The
teamlucc package includes an
accuracy function to assist with measuring the accuracy of image
classifications. In addition to the standard contingency tables often used for
accuracy also calculates “quantity disagreement” and
“allocation disagreement” as introduced by Pontius and Millones 20111.
Unbiased contingency tables can be calculated with
accuracy by supplying a
pop parameter to
accuracy provides 95% confidence intervals for
user’s, producer’s, and overall accuracies, calculated as in Olofsson et al.
To calculate a basic contingency table, assuming that population frequencies of
the observed classes can be estimated from the classification output, and using
the 40% of pixels that were excluded from training the classifier as testing
data, run the
accuracy function using the model calculated above:
Note the warning from
accuracy, which is reminding us that we did not provide
population frequencies for the classes.
summary method for the
accuracy object is provided by
calculates user’s, producers, and overall accuracy, and quantity and allocation
Pontius, R. G., and M. Millones. 2011. Death to Kappa: birth of quantity disagreement and allocation disagreement for accuracy assessment. International Journal of Remote Sensing 32:4407-4429. ↩
Olofsson, P., G. M. Foody, S. V. Stehman, and C. E. Woodcock. 2013. Making better use of accuracy data in land change studies: Estimating accuracy and area and quantifying uncertainty using stratified estimation. Remote Sensing of Environment 129:122-131. ↩