Showing posts with label Amazon. Show all posts
Showing posts with label Amazon. Show all posts

Saturday, April 1, 2017

AWS, R, RStudio, Parallel processing

In this post I will share my experiences with using a spot AMI instance for heavy parallel processing of R scripts.

Fist we start a Rstudio AMI:



Start a spot instance:




Request:

Search AMI:


Search for Rstudio:


select the needed compute power:



Its recommended to look at the pricing history:



In this case, we select a General Purpuse 16 CPU machine, which has a fair price per hour. Since we only use the machine for some heavy processing, we will decommission it in a few hours.



Leave the other settings as is.

select next, be sure to create a new key pair, in case you do not have the original key pair.
Its also good to add a new security group which has port 80 open:



open the ports:


The click 'Launch' instance, and then click op the capacity link;

Look up the public IP:

You should now be able to log on:

Log on credentials can be found here:
http://www.louisaslett.com/RStudio_AMI/

Now that the server is up and running we can start using R packages to allow parallel processing:
  
install.packages("doParallel",dependencies=TRUE)
install.packages("doMC",dependencies=TRUE)

We use the package
  
library(doParallel)
library(foreach)
library(doMC)

In order to initialize we;
- set the number of cores.
- initialize a cluster
-and we need to export our functions, variables and datasets to the clusters:
  
#prep parralel processing
# Calculate the number of cores
no_cores <- detectCores() - 1
# Initiate cluster
cl <- makeCluster(no_cores)
registerDoMC(no_cores)
clusterExport(cl=cl, varlist=c("splitter", "create_ngram_table"))

There are a number of options for parralel processing roughly speaking you can choose from ParLapply and forEach. When you would like to split your dataset in multiple subsets and want to have a worker node perform an operation this can easily be done with a forEach, see example below. In both cases a list is returned. While we usualy would like to have a full dataset in return. In order to achieve this result I used the code do.call("rbind", result).

Many more examples can be found on: http://gforge.se/2015/02/how-to-go-parallel-in-r-basics-tips/
  
result = foreach(j=seq(1,nl, by=(nl/parts))) %dopar% {
}
}
}
do.call("rbind", result)
}

Sunday, March 19, 2017

Copy Paste Images from Clipboard in Blogger / Blogspot

I have always found it a bit annoying that it's necessary to upload images to Blogspot before being able to display them:
For this reason, I started looking for a windows application with a paste from clipboard option for images. Unfortunately, I was not able to find a free/open source solution. For this reason, I decided to build a workaround.

In this post I will explain how this works.

The basic idea is to use GreenShot for image capturing, an AWS server that hosts the images, a PuTTY pscp script to upload the file, and to copy a HTML text to the clipboard which you can paste into blogger.

This shows how the final solution works:

First start a screen capture with Greenshot:



And then select the option "upload". The script will then kick off and upload the screen capture to the AWS Server and return the HTML code that needs to be copied into Blogger:


This code is then copied into the HTML part of blogger:
Within Greenshot its possible to specify external commands. With this option a Bach file is started:

In here we configure the pointer to the batch file:
Also, in the settings, set the file format of the capture to hh mm to make sure the file names do not have spaces.



The batch file looks like this:


This script takes the path of the screen capture as a parameter and then uploads the file to the AWS Webserver. Unfortunately, the clip.exe command turned out the be buggy so I pasted the html code in a text document and then open this document with Notepad.

The webserver is a default (free of charge web-hosting AWS AMI):
First navigate to EC2 for elastic cloud options:

Launch an instance:

Search for the Wordpress AMI:

Select the free hosting option;

Then press next untill you get the option 'launch', press this as well. You will the see the screen to select a key pair, this is important:

Download the PEM file, we need this is the next step:
Use this file to create a pkk file, these steps have been described here:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/putty.html

Then, build the transfer script as described in the manual:

This script in part of the upload script: