In this lab, you are expected to learn/put in practice the following skills:
Give yourself a few minutes to think about what you learned about parallelization. List three examples of problems that you believe may be solved using parallel computing, and check for packages on the HPC CRAN task view that may be related to it.
Answer here.
The following functions can be written to be more efficient without
using parallel
:
n x k
dataset with all its
entries having a Poisson distribution with mean
lambda
.fun1 <- function(n = 100, k = 4, lambda = 4) {
x <- NULL
for (i in 1:n)
x <- rbind(x, rpois(k, lambda))
return(x)
}
fun1alt <- function(n = 100, k = 4, lambda = 4) {
# YOUR CODE HERE
}
# Benchmarking
microbenchmark::microbenchmark(
fun1(),
fun1alt()
)
How much faster?
Answer here.
max.col()
).# Data Generating Process (10 x 10,000 matrix)
set.seed(1234)
x <- matrix(rnorm(1e4), nrow=10)
# Find each column's max value
fun2 <- function(x) {
apply(x, 2, max)
}
fun2alt <- function(x) {
# YOUR CODE HERE
}
# Benchmarking
microbenchmark::microbenchmark(
fun2(),
fun2alt()
)
Answer here with a plot.
We will now turn our attention to non-parametric bootstrapping. Among its many uses, non-parametric bootstrapping allow us to obtain confidence intervals for parameter estimates without relying on parametric assumptions.
The main assumption is that we can approximate many experiments by resampling observations from our original dataset, which reflects the population.
This function implements the non-parametric bootstrap:
my_boot <- function(dat, stat, R, ncpus = 1L) {
# Getting the random indices
n <- nrow(dat)
idx <- matrix(sample.int(n, n*R, TRUE), nrow=n, ncol=R)
# Making the cluster using `ncpus`
# STEP 1: GOES HERE
# STEP 2: GOES HERE
# STEP 3: THIS FUNCTION NEEDS TO BE REPLACED WITH parLapply
ans <- lapply(seq_len(R), function(i) {
stat(dat[idx[,i], , drop=FALSE])
})
# Coercing the list into a matrix
ans <- do.call(rbind, ans)
# STEP 4: GOES HERE
ans
}
parallel
. Here is just an example for you to try:# Bootstrap of a linear regression model
my_stat <- function_for_lm
# DATA SIM
set.seed(1)
n <- 500
R <- 1e4
x <- cbind(rnorm(n))
y <- x*5 + rnorm(n)
# Check if we get something similar as lm
ans0 <- confint()
ans1 <- my_boot()
# your code here
Answer here.
Once you have saved this Rmd file, try running the following command in your terminal:
Rscript --vanilla -e 'rmarkdown::render("[full-path-to-your-Rmd-file.Rmd]")' &
Where [full-path-to-your-Rmd-file.Rmd]
should be replace
with the full path to your Rmd fileā¦ :).