Overview
This post outlines how to use the teamlucc
package to remove thick clouds
from Landsat imagery using the Neighborhood Similar Pixel Interpolator (NSPI)
algorithm by Zhu et
al.1. teamlucc
includes the original (modified slightly to be called from R)
IDL code by Xiaolin
Zhu, as well as a native R/C++
implementation of the NSPI algorithm. Thanks to Xiaolin for permission to
redistribute his code along with the teamlucc
package.
Getting started
First load the teamlucc
package, and the SDMTools
package we will use
later:
If teamlucc
is not installed, install it using devtools
”
First I will cover how to cloud fill a single clouded image using a single
clear (or partially clouded) image. Skip to the end to see how to automate the
cloud fill process using teamlucc
.
Cloud fill a single clouded image with a single clear image
This example will use a portion of a 1986 Landsat 5 scene from Volcan Barva, Costa Rica (a TEAM Network monitoring site). The scene is WRS-2 path 15, row 53. Particularly in the tropics, it can sometimes be difficult to find a Landsat image that is cloud-free. Cloud filling can offer a solution to this problem if there are multiple Landsat scenes captured of an area of interest, that, taken together, offer a cloud-free (or nearly cloud-free) view of an area. Throughout this post I will refer to the “base” and the “fill” images. The “base” image is a cloudy image that will be filled using images (the “fill images) of the same area that were captured on different dates.
While it can sometimes be possible to find a cloud-free scene from a different part of the year that can be used to fill in a cloudy scene from an earlier or later base date, it is often the case that both the fill and base image will have clouds. Therefore we must use cloud masks to mark areas in both the base and the fill image. Without a cloud mask for the fill image we could otherwise inadvertently fill clouded areas in one image with also cloudy pixels from another image.
The base (cloudy) image for this example is from January 5, 1986, and the fill image is from January 21, 1986. The images are surface reflectance images from the Landsat Surface Reflectance Climate Data Record (CDR), that also include cloud masks constructed with the Function of Mask (fmask) algorithm2. Both of these images have significant cloud cover, and some areas are cloudy in both images. This example will show the ability of the cloud fill algorithms to function even in difficult circumstances.
To follow along with this analysis, download these
files.
Note that the original CDR reflectance images have been rescaled to range
between 0 and 255 in the files supplied here (this rescaling is not required
prior to performing cloud fill - I just did it here to make the files sizes
smaller so they could be more easily hosted on this site).
Load input data
First load the base and fill images into R:
Notice the cloud cover in the base image:
The fill image also has cloud cover, but less than the fill image - there are areas of the fill that can be used to fill clouded pixels in the base image:
Topographic correction
In mountainous areas, topographic correction should be performed prior to cloud
fill1. teamlucc
supports performing topographic correction using
algorithms derived from those in the landsat
package](http://cran.r-project.org/web/packages/landsat/index.html) by Sarah
Goslee3.
To perform topographic correction, use the topographic_corr
function in
teamlucc
. First load the slope and aspect rasters:
Now call the topographic_corr
function twice, to topographically correct both
the base and fill image. Note that the sun angle elevation and sun azimuth
(both in degrees) must be supplied - values for these parameters can be found
in the metadata accompanying your imagery. See ?topographic_corr
for more
information. DN_min
and DN_max
can be used to ensure that invalid values
are not generated by the topographic correction routine (which can sometimes be
a problem in very heavily shadowed areas, or in very bright areas, such as
clouds).
Construct cloud masks
The fmask band from the CDR imagery uses the following codes:
Pixel type | Code |
---|---|
Clear land | 0 |
Clear water | 1 |
Cloud shadow | 2 |
Snow | 3 |
Cloud | 4 |
No observation | 255 |
We need to construct a mask of areas where all pixels that are cloud (code 4)
or cloud shadow (code 2) are equal to 1, and where pixels in all other areas
are equal to zero. This is easy using raster algebra from the R raster
package. First load the masks:
Now do the raster algebra, masking out clouds and cloud shadows, and setting missing values in both images to NAs in the masks:
Plot these masks to double-check they align with the clouds in the images we viewed earlier:
Now use these two masks to mask out the clouds in the fill and base images, by
setting clouded areas to zero (as the cloud_remove
code treats pixels with
zero values as “background”:
The cloud mask for the base image must be constructed so that each cloud has
its own unique integer code, with codes starting from 1. This process can be
automated using the ConnCompLabel
function from the SDMTools
package.
However, because there are clouds in our fill image as well as in our base
image, we need to modify the base_cloud_mask
slightly to account for this.
First, code all pixels in base_cloud_mask
that are clouded in
fill_cloud_mask
with NA
s. This will tell the ConnCompLabel
function not
to label these pixels as clouds (because they are also clouded in the fill
image, we cannot perform cloud fill on these pixels).
Now run ConnCompLabel
, and set the output datatype to INT2S
(meaning the
data in base_cloud_mask
can range from -32768 - 32767). That said, please
don’t try to run cloud fill with 32,767 clouds in your image :).
The final base_cloud_mask
is now coded as:
Pixel type | Code |
---|---|
Background in fill or base |
NA |
Clouded in fill |
-1 |
Clear in base , clear in fill |
0 |
Clouded in base |
1 … n |
where n is the number of clouds in the image:
Fill clouds
For this simple example, we will directly use the cloud_remove
function in
teamlucc
. This function has a number of input parameters that can be supplied
(see ?cloud_remove
). Two important ones to note are DN_min
and DN_max
.
These are the minimum and maximum valid values, respectively, that a pixel in
the image can take on. These limits are used to ignore unrealistic predictions
that may arise in the cloud fill routine. For the base and fill images we are
working with here, these values are 0 and 255, for max and min, respectively.
Set these parameters to appropriate values as necessary for the images you are
working with.
There are three different cloud fill algorithms that can be used from
teamlucc
. Two require an IDL installation, while the third uses a cloud fill
algorithm that is native to R (though it is coded in C++ for speed reasons).
The R-based algorithm is a bit more flexible than the IDL algorithms, and is
designed to handle images in which both the base and fill image have clouds.
Based on the options supplied to cloud_remove
, teamlucc
will select one of
the fourt algorithms to run. The algorithm
parameter to cloud_remove
determine which cloud fill algorithm is used:
Algorithm | Requires IDL license? | Algorithm used by cloud_remove |
---|---|---|
CLOUD_REMOVE |
Yes | CLOUD_REMOVE1 |
CLOUD_REMOVE_FAST |
Yes | CLOUD_REMOVE_FAST1 |
teamlucc |
No | teamlucc fill algorithm |
simple |
No | simple linear model algorithm |
First I will review the two IDL-based algorithms, then I will discuss the two R-based algorithms.
Cloud removal using IDL code
If run with algorithm="CLOUD_REMOVE"
(the default), cloud_remove
runs an
IDL script provided by Xiaolin Zhu. For
R to be able to run this script it must know the path to IDL on your machine.
For Windows users, this means the path to “idl.exe”. To specify this path you
will need to provide the idl
parameter to the cloud_remove
script. The
default value (C:/Program Files/Exelis/IDL83/bin/bin.x86_64/idl.exe
) may or
may not work on your machine. I recommend you set the IDL path at the beginning
of your scripts:
An optional out_name
parameter can be supplied to cloud_remove
to specify
the filename for the output file. If not supplied, R will save the filled image
as an R object pointing to a temporary file.
To run the cloud removal routine, call the cloud_remove
function with the
appropriate parameters. Note that this computation may take some time (it takes
around 1.5 hours on a 2.9Ghz Core-i7 3520M laptop).
Use plotRGB
to check the output. Note that IDL does not properly code missing
values in the output - prior to plotting or working with the data be sure to
set any pixels with values less than DN_min
(here DN_min
is zero) to NA
:
The default cloud fill approach can take a considerable amount of time to run.
There is an alternative approach that can take considerably less time to run,
with similar results. This option can be enabled by supplying the
algorithm="CLOUD_REMOVE_FAST
parameter to cloud_remove
.
The “fast” version of the algorithm makes some simplifications to improve
running time. Specifically, rather than follow the precise algorithm as
outlined by Zhu et al.1, the “fast” routine uses k-means clustering to
divide the image into the number of classes specified by the num_class
parameter. The script then constructs a linear model of the temporal change in
reflectance for each class within the neighborhood of a given cloud. This
“temporal” adjustment is complemented by a “spatial” adjustment that considers
the change in reflectance in a small neighborhood around each clouded pixel.
For each clouded pixel, a weighted combination of the predicted fill values
from the spatial and temporal models determines the final predicted value for
that pixel. This version of the algorithm takes only 2.5 minutes to run on the
same machine as used above:
Use plotRGB
to check the output:
Cloud removal using native R code
If you do not have IDL on your machine, there is a C++ implementation of the
NSPI cloud fill algorithm that will run directly in R, as well as a “simple”
cloud fill algorithm that uses linear models developed using the neighborhood
of each cloud to perform a naive fill. To run the R version of the NSPI
algorithm, call the cloud_remove
function with the same parameters as above,
but specify algorithm="teamlucc"
. This function also has a verbose=TRUE
option to tell cloud_remove
to print progress statements as it is running
(this option is not available with the IDL scripts shown above). This version
is nearly identical to the IDL algorithm called with the
algorithm="CLOUD_REMOVE"
option, but it takes much less time to run (only 3-4 minutes on my machine).
Note that when cloud_remove
is run with algorithm="teamlucc"
and
verbose=TRUE
, there will be a large number of status messages printed to the
screen. For the purposes of this demo (so that the webpage is not unnecessarily
long), I have not used the verbose=TRUE
argument, but I recommend using it if
you try this command yourself.
View the results with plotRGB
:
The fastest cloud fill option is to run cloud_remove
with
algorithm="SIMPLE"
. This uses a simple cloud fill approach in which the value
of each clouded pixel is calculated using a linear model. The script develops a
separate linear model (with slope and intercept) for each band and each cloud.
For each cloud, and each image band, the script finds all pixels clear in both
the cloudy and fill images, and calculates a regression model in which pixel
values in the fill image are the independent variable, and pixel values in the
clouded image are the dependent variable. The script then uses this model to
predict pixel values for each band in each cloud in the clouded image. For
example:
View the results with plotRGB
:
Compare all four fill algorithms:
To plot the results of all four fill algorithms, make a layer stack of the first band of all four images, then plot:
Automated cloud fill from an image time series
The teamlucc
package also includes functions for automated cloud filling from
an image time series. Automatic cloud filling is performed using the
auto_cloud_fill
function. This function automates the majority of the cloud
filling process. As multiple images are required to demonstrate this process,
the images required for this portion of the example are not available for
download from this site. I suggest you download the appropriate imagery for a
particular study site and preprocess the imagery using the auto_setup_dem
and
auto_preprocess_landsat
functions in the teamlucc
package so that you can
follow along with this example. The auto_preprocess_landsat
function will
also perform topographic correction, which is necessary prior to cloud filling
images in mountainous areas.
The auto_cloud_fill
function allows an analyst to automatically construct a
cloud-filled image after specifying: data_dir
(a folder of Landsat
images), wrspath
and wrsrow
(the WRS-2 path/row to use), and start_date
and end_date
(a start and end date limiting the images to use in the
algorithm). The analyst can also optionally specify a base_date
, and the
auto_cloud_fill
function will automatically pick the image closest to that
date to use as the base image (otherwise auto_cloud_fill
will automatically
pick the image with the least cloud cover as the base image).
As the auto_cloud_fill
function automatically chooses images for inclusion in
the cloud fill process, it relies on having images stored on disk in a
particular way, and currently only supports cloud fill for Landsat CDR surface
reflectance images. To ensure that images are correctly stored on your hard
disk, use the auto_preprocess_landsat
function to extract the original
Landsat CDR hdf files from the USGS archive. The auto_preprocess_landsat
function will ensure that images are extracted and renamed properly so that
they can be used with the auto_cloud_fill
script.
-
Zhu, X., Gao, F., Liu, D., Chen, J., 2012. A modified neighborhood similar pixel interpolator approach for removing thick clouds in Landsat images. Geoscience and Remote Sensing Letters, IEEE 9, 521–525. doi:10.1109/LGRS.2011.2173290 ↩ ↩2 ↩3 ↩4 ↩5
-
Zhu, Z. and Woodcock, C. E., Object-based cloud and cloud shadow detection in Landsat imagery, Remote Sensing of Environment (2012), doi:10.1016/j.rse.2011.10.028 ↩
-
Sarah C. Goslee (2011). Analyzing Remote Sensing Data in R: The landsat Package. Journal of Statistical Software, 43(4), 1-25. ↩