Do you really need to get the inverse of that weight matrix?

Motivation

When you run simulations with spatially dependent data, you can use a weight matrix to quantify the nature and degree of dependence between observations. For example, suppose you would like to generate spatially correlated shocks (errors), \(\varepsilon\), specified as follows:

\[ \varepsilon = \lambda W \cdot \varepsilon + u \]

where \(W\) is the weight matrix, \(\lambda \in [0,1]\) is a parameter that determines the degree of spatial dependence, and \(u \sim N(0,\sigma^2)\). A bit of of arithmetic operations of this equation leads to the following:

\[ \varepsilon = (I-\lambda W)^{-1} \cdot u \]

So, one way to calculate \(\varepsilon\) is to first calculate \((I-\lambda W)^{-1}\), and then multiply it with \(u\). This approach, however, can be very slow and also memory-inefficient. In this blog post, I will discuss these problems and also introduce an alternative that works better under certain circumstances.

Here are the packages we will use in this blog post:

library('rgdal')
library('spdep')
library('magrittr')
library('data.table')

When the weight matrix is not so large

Suppose you are working with the following gridded field, where each grid represents an observation.

plot(field)

We would like to generate spatially dependent error for this field. First, let’s create a weight matrix.

#--- get the centroid of each grid ---#
cent <- getSpPPolygonsLabptSlots(field)

#--- identify neighbors that are located within the 10-meter radius circle buffer ---#
dnb <- dnearneigh(cent,0,20)
dsts <- nbdists(dnb,cent)
idw <- lapply(dsts, function(x) 1/x)

#--- generate a weight matrix ---#
W <- nb2mat(dnb,glist=idw, style="W")


We now define \(\lambda\) and \(u\).

#--- lambda ---#
lambda <- 0.8

#--- # of observations (# of grids) ---#
N <- nrow(W)

#--- normally distributed independent errors ---#
u <- rnorm(N)


In order to calculate \(\varepsilon\), you can first find the inverse of \(I-\rho W\), and then multiply it by \(u\). The spdep package offers the invIrW function, which calculates the inverse of \(I-\rho W\).

(#--- time the code ---#
st_1_60 <- system.time(
    #--- find the inverse ---#
    inv <- invIrW(W,lambda)
    )
)
##    user  system elapsed 
##  24.277   0.151  24.448


As you can see, it took 24.45 seconds on my laptop computer (MacBook Pro, Mid 2015, quad-core, 2.8 GHz Intel Core i7) to calculate the inverse. But, once you do that, the next step is very fast as it’s just a matrix multiplication.

#--- multiply the inverse with u to get e---#
e <- inv %*% u


An alternative to this approach does not require the calculation of the inverse. Rewriting the equation above, we can write as follows:

\[ (I-\lambda W) \cdot \varepsilon = u \]

Note that we know the value of \(I\), \(\lambda\), \(W\), and \(u\). \(\varepsilon\) is the unknown that we want find the value of. Yes, this is a system of linear equations with 1886 unknowns. For a computer, it is so much easier to solve the system of linear equations than finding the inverse of a large matrix. To solve such a system, you can just use the solve() function from the base package as follows:

(
st_2_60 <- system.time(
    e <- solve(diag(N)-lambda*W,u)
    )
)
##    user  system elapsed 
##   1.454   0.011   1.466


Okay, so this is a lot faster!! Does this mean you should always pick the second option? Not, so fast. When you run simulations with uncertainty, you run the same process many many times so that you can get on-average results. Suppose, the only thing that changes across iterations is the error term (\(u\)), but \(W\) stays the same since you are working on the same field. In this case, the first option needs to calculate the inverse, but once it is calculated, you simply need to repeat the quick matrix multiplication many times. On the other hand, you need to solve the system of linear equations repeatedly if you go for the second option. So, the fair comparison when the number of iteration is, say 100 (100 is rather small for a number of iterations, but it’s big enough to show the point here.), is as follows:

#===================================
# First option
#===================================
#--- find the inverse ---#
option_1 <- function(){
    inv <- invIrW(W,lambda)

    for (i in 1:100){
        u <- rnorm(N)
        e <- inv %*% u
    }
}   

system.time(
    option_1()
    )
##    user  system elapsed 
##  25.882   0.063  25.957


#===================================
# Second option
#===================================
option_2<- function(){
    A <- diag(N)-lambda*W
    for (i in 1:100){
        u <- rnorm(N)
        e <- solve(A,u)
    }
}   

system.time(
    option_2()
    )
##    user  system elapsed 
## 110.804   1.436 112.670


So, the clear winner is the first option. Moreover, as the number of iterations gets larger, the comparative disadvantage of the second option is greater. Okay, so now you may wonder why I even introduced the second option. The real benefit of using the second option becomes apparent when the size of weight matrix is much larger. Now, before I delve into this, let me talk about a sparse matrix a little bit and how it may improve both in terms of memory and computation time.

Using Sparse Matrix

A sparse matrix is a matrix with most of its entries being zero. In most cases, when we work with a weight matrix in spatial econometrics or simulations, it is a sparse matrix, just like ours here.

#--- number of zero entries ---#
num_zeros <- sum(unlist(W)==0) 

#--- percentage ---#
num_zeros/(N*N)
## [1] 0.9979269

Yes, so, almost all the entries are indeed zero. When most of the entries are zero, there are much more efficiency ways of storing the same data (see here for various data representation format).

In \(R\), you can use the Matrix() function from the Matrix package to convert a full matrix into a sparse matrix:

#--- convert a regular matrix to a sparse matrix ---#
W_sparse <- Matrix(W,sparse=TRUE)


You can check the representation method under use by applying the class() function:

#--- check the class ---#
class(W_sparse)
## [1] "dgCMatrix"
## attr(,"package")
## [1] "Matrix"


So, it’s dgCMatrix, which is a general sparse matrix of doubles, in CSC representation. Now, let’s compare the size of the non-sparse and sparse matrices.

#--- full ---#
object.size(W)
## 28562760 bytes
#--- sparse ---#
object.size(W_sparse)
## 203112 bytes


As you can see, \(W\_sparse\) takes up only about \(1/15\) of the RAM memory. The memory saving here has no practical benefit because 28MB is really nothing for modern computers. But, imagine that you have to work with a 20 GB matrix or a number of 5 GB matrices at the same time. Or you may want to parallelize an operation that uses a 5 GB matrix, distributed across 4 cores. Do you have enough RAM memory on your computer? It is increasingly common to use a big data like this in research. Indeed, I had one of my students hit this wall, which motivated me to write this blog post. Now, the benefit of a sparse matrix representation does not just end with memory saving. It also facilitates faster matrix operations. To see this, redo the same operations we did before except that we use \(W\_sparse\), instead of \(W\).

#--- time the code ---#
system.time(
    #--- find the inverse ---#
    inv <- invIrW(W_sparse,lambda)
    )
##    user  system elapsed 
##  13.438   0.180  13.666


#--- time the code ---#
system.time(
    e <- solve(Diagonal(N)-lambda*W_sparse,u)
    )
##    user  system elapsed 
##   0.011   0.001   0.012


You can clearly see the improvement in computational time for both cases, but especially the second option. Notice here that I am using the Diagonal() function from the Matrix package to create the \(N\) by \(N\) diagonal sparse matrix. Now, let’s evaluate how long it takes to complete \(100\) iterations of spatial error generation using the sparse matrices.

#===================================
# Second option
#===================================
option_2_sparse <- function(){
    A <- Diagonal(N)-lambda*W
    for (i in 1:100){
        u <- rnorm(N)
        e <- solve(A,u)
    }
}   

#--- time the code ---#
(
    st_2_60_sp <- system.time(
        option_2_sparse()
        )
)
##    user  system elapsed 
##   0.381   0.069   0.450


Wonderful! Using sparse matrices, it now takes only 0.45 seconds to complete \(100\) iterations, which clearly is much faster than the first option using the sparse matrix.

When the weight matrix is much larger


Now, instead of 60-60 feet grids, suppose we would like to generate data at finer spatial resolution, say 10-10 feet grids, named \(field\_10\) here. Here is the map of the grids:

plot(field_10)


Here is the number of the dataset.

#--- number of observations ---#
N <- nrow(field_10@data)
N 
## [1] 66195


Notice that we now have \(66,195\) observations. This means that the weight matrix for this data will be \(66,195\) by \(66,195\).

#--- get the centroid of each grid ---#
cent <- getSpPPolygonsLabptSlots(field_10)

#--- identify neighbors that are located within the 2-meter radius circle buffer ---#
dnb <- dnearneigh(cent,0,4)
dsts <- nbdists(dnb,cent)
idw <- lapply(dsts, function(x) 1/x)


When the number of observations is this large, just creating a full weight matrix takes a long time. The following code would attempt to a full weight matrix, which would be about 35GB in size. You certainly do not want to do this.

W <- nb2mat(dnb,glist=idw,style = "W")


Instead, you could use the nb2listw() function, which produces a listw object. listw stores data in a similar manner as the compressed row oriented representation (CSR). Consequently, it is much more efficient in memory.

(
st_listw_10 <- system.time(
    W <- nb2listw(dnb,glist=idw,style = "W")
    )
)
##    user  system elapsed 
##   6.340   0.066   6.410
#--- object size (in GB) ---#
object.size(W)/1e6
## 28.1 bytes


So, it takes 6.41 seconds, and it is only about 30.8 MB instead of 35 GB. However, the spdep offers a better option. You can create a listw object first and then coerce it into a sparse matrix as follows:

#--- create a sparse weight matrix ---#
W_sparse <- nb2listw(dnb,glist=idw,style = "W") %>% 
    as('CsparseMatrix')

#--- check the class ---#
class(W_sparse)
## [1] "dgCMatrix"
## attr(,"package")
## [1] "Matrix"
#--- check the memory size ---#
object.size(W_sparse)/1e6
## 10.9 bytes


The memory size of the sparse weight matrix is only 11.93 MB, which is much smaller than the listw object. Now, let’s look at the performance of the use of the sparse matrix in terms of computational time.

Okay, let’s try the finding-the-inverse option first.

#--- time the code ---#
system.time(
    #--- find the inverse ---#
    inv <- invIrW(W_sparse,lambda)
    )
## Error in asMethod(object): Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
## Timing stopped at: 0.906 0.146 1.06


Right, you cannot even calculate it. It is much larger than my RAM. Moreover, even if my RAM is big enough to hold the resulting object, I cannot even imagine how long finding the inverse would take. Okay, so we just found that getting the inverse of a very large matrix is practically infeasible. However, the second option still works!

#--- generate u ---#
u <- rnorm(N)

#--- generate coefficient matrix ---#
A <- Diagonal(N)-lambda*W_sparse

#--- time the code ---#
(
st_2_10_sparse <- system.time(
    e <- solve(A,u)
    )
)
##    user  system elapsed 
##   1.779   0.207   1.987


Great! It is not only feasible to calculate \(e\), it also took only 1.987 seconds. That’s something. Keep in mind here that you need to use the Diagonal() function from the Matrix package to create the \(N\) by \(N\) diagonal matrix. If you use diag() instead, \(R\) will attempt to create a full (non-sparse) diagonal matrix, which is over 30GB in size.

Summary

  • Use sparse matrices to save both memory and computational time
  • Do not get the inverse of a matrix, just solve the system of linear equations



Session Information

## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS  10.14.1
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] data.table_1.11.8 magrittr_1.5      spdep_0.8-1       spData_0.2.9.4   
## [5] Matrix_1.2-15     rgdal_1.3-6       sp_1.3-1         
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.0        knitr_1.20        gmodels_2.18.1   
##  [4] splines_3.5.1     MASS_7.3-50       lattice_0.20-35  
##  [7] stringr_1.3.1     tools_3.5.1       grid_3.5.1       
## [10] nlme_3.1-137      xfun_0.4          coda_0.19-2      
## [13] deldir_0.1-15     gtools_3.8.1      htmltools_0.3.6  
## [16] yaml_2.2.0        rprojroot_1.3-2   digest_0.6.18    
## [19] bookdown_0.7      evaluate_0.12     rmarkdown_1.10   
## [22] blogdown_0.9      gdata_2.18.0      stringi_1.2.4    
## [25] compiler_3.5.1    LearnBayes_2.15.1 backports_1.1.2  
## [28] boot_1.3-20       expm_0.999-3