Fist we start a Rstudio AMI:
Start a spot instance:
Request:
Search AMI:
Search for Rstudio:
select the needed compute power:
Its recommended to look at the pricing history:
In this case, we select a General Purpuse 16 CPU machine, which has a fair price per hour. Since we only use the machine for some heavy processing, we will decommission it in a few hours.
Leave the other settings as is.
select next, be sure to create a new key pair, in case you do not have the original key pair.
Its also good to add a new security group which has port 80 open:
open the ports:
The click 'Launch' instance, and then click op the capacity link;
Look up the public IP:
You should now be able to log on:
Log on credentials can be found here:
http://www.louisaslett.com/RStudio_AMI/
Now that the server is up and running we can start using R packages to allow parallel processing:
install.packages("doParallel",dependencies=TRUE)
install.packages("doMC",dependencies=TRUE)
We use the package
library(doParallel)
library(foreach)
library(doMC)
In order to initialize we;
- set the number of cores.
- initialize a cluster
-and we need to export our functions, variables and datasets to the clusters:
#prep parralel processing
# Calculate the number of cores
no_cores <- detectCores() - 1
# Initiate cluster
cl <- makeCluster(no_cores)
registerDoMC(no_cores)
clusterExport(cl=cl, varlist=c("splitter", "create_ngram_table"))
There are a number of options for parralel processing roughly speaking you can choose from ParLapply and forEach. When you would like to split your dataset in multiple subsets and want to have a worker node perform an operation this can easily be done with a forEach, see example below. In both cases a list is returned. While we usualy would like to have a full dataset in return. In order to achieve this result I used the code do.call("rbind", result).
Many more examples can be found on: http://gforge.se/2015/02/how-to-go-parallel-in-r-basics-tips/
result = foreach(j=seq(1,nl, by=(nl/parts))) %dopar% {
}
}
}
do.call("rbind", result)
}