Tips to setup Rstudio on Ubuntu cloud server

This past week I’ve been struggling with memory issues when using R. The computations were taking too long and often resulted in ‘cannot allocate vector of size xx ‘ errors. So I finally decided to move to cloud. In this blog post I will share the resources I found helpful and share some tips I gained from experience.

  1. Set up Google Cloud Compute Engine- VM Instance
  2. Install R, Rstudio
  3. Additional tips

Section 1. Set up Google Cloud Compute Engine- VM instance

  1. Setup google cloud platform account
  2. From google cloud platform home page, navigate to ‘Compute Engine dashboard’
  3. Select ‘Create Instance’option from top navigation bar
  4. Name your instance and select zone ( I chose default zone ie. us-central1-b)
  5. Select Machine type. My requirement was more RAM, so I chose ‘4vCPUs. 26GB RAM, n1-highmem-4’ . Based on GCE’s recommendation I might downgrade to ‘2vCPUs. 13GB RAM, n1-highmem-2’ as my memory is underutilized
  6. Select Boot Disk. I chose ‘Ubuntu 14.04 LTS’ with standard persistant disk and 80GB disk size.
  7. Skip to Firewall. Allow http traffic
  8. Expand ‘Management, disk, networking, SSH keys’. In metadata section, add publickey. If you do not already have a key, follow instructions here. I used PuTTy to create public and private keys. Copy Paste public key in metadata section.  NOTE: before saving public key, change “Key Comment” to username  that is easy to remember. This username will be used to log in to Putty. Continue to “Save  public key” and “Save private key”. Open private key using Pageant.putty.JPG
  9. Finally, create instance.
    Usually this is the last step, before you can continue your work. However, I have added an extra step to my setup process based on issues faced in the past.
  10.  By default your disk space is limited to 10GB. In order to use the full disk space you assigned (e.g. 80GB in my case), you need to resize partition. Follow steps from “Repartition a root persistent disk“.

Section 2. Install R, Rstudio

To install R, follow instructions from Digital Ocean’s “How to Set Up R on Ubuntu 14.04” guide. It worked perfectly for me.

To install Rstudio, follow instructions from Digital Ocean’s “How to Set up RStudio on an Ubuntu Cloud Server”. After creating Rstudio user, return to this page before proceeding to use Rstudio.

  1. Create Rstudio user: sudo adduser rstudio
  2. Add user rstudio to group sudo. Without giving rstudio sudo permissions you’ll be able to view files, process data but not save any files.

    sudo adduser rstudio sudo

  3. Now open Putty Desktop app.
    putty1
  4. In “Host Name (or IP address) field, enter External IP of your instance. (External IP can be obtained from Compute Engine home page)
    gce-externalIP
  5. Expand SSH in the left menu, then go to Auth.
    putty2
  6. Browse and add private key file here. By default this file is saved in “C:/users/{username}/.ssh/{filename}.ppk
  7. From the menu on left, go to “Tunnels”
    putty3
  8. In source port, enter 8787 (from How to set Up Rstudio on Ubuntu cloud server guide). In destination, enter {Internal IP}:{port}. Internal IP is obtained from compute engine home page. Port can be any free port. I like to use the same port numbers because its easy to remember. Click ‘Add’ button.
  9. [OPTIONAL] Return to “Session” at the top of left menu. In “Saved sessions” field enter a name, then click ‘save’. Next time you need to open the same session, select saved session from list, and click  load.
  10. Now click ‘Open’, and a console will appear asking to login. Enter the username you selected in step 8 of ‘section 1 – Set up Google Cloud Compute Engine – VM Instance’ above.
  11. You should be able to log in successfully. If login fails, make sure your private key is open and running in Pageant, and your username matches the username in your public key. To confirm username, go to compute engine home page, select your instance. Scroll down to SSH keys and note your username. Restart  putty session and attempt to log in again.
  12. Once you are successfully logged in, Open a browser and go to “localhost:8787” (NOTE: Port number should match the port number you chose in step 8 above as destination port). You should now see Rstudio Sign in page
  13. Return to Digital Ocean’s “How to Set up Rstudio on an Ubuntu cloud server” and complete the last step “Using Rstudio”

Section 3. Additional Tips

When running a time consuming task, the last thing you want is for your connection to be interrupted. This kept happening to me every time I stepped away from my computer. I took a few precautions and thankfully haven’t faced any interruptions since. The solutions aren’t my own so I’ll just share my finds.

  1. Turn off Windows 10 automatic updates: Prevent Windows 10 from automatically restarting your PC after Updating
  2. Using Keep alive in Putty: source
  3. Screen : I didn’t actually get to try this, but it seems like an awesome tool to use when working on remove servers.

Rstudio Tips:

First, remember to give sudo access to your rstudio user. Its frustrating not being able to save your work.

Secondly, to install missing packages, I continued to use method mentioned in “How to Set up R on Ubuntu 14.04” so that all users would have access.

sudo su – -c “R -e \”install.packages(‘shiny’, repos = ‘http:// cran.rstudio.com/’)\””

Thirdly, use the Rstudio cloud server ‘Upload’ and ‘Export’ options to transfer files.

 

To get started, I first needed to transfer data to my VM instance. This can be done using google cloud sdk shell. see gcloud compute copy-files documentation.

gcloud compute copy-files {source} {destination}

NOTE: there is a bug in this command that does not accept paths like C:/users …. , so you first need to change directory to folder where your files are present and then transfer the to remote directory. read more here. Using this method I failed to download files from remote directory to local machine.

Rstudio provides an easier way to do this.

rstudio

Notice the ‘Upload’ and ‘export’ options in Rstudio cloud server. The ‘Upload’ option allows you to upload files from your local machine to cloud server. While ‘Export’ allows you to download from cloud server directly to your local machine. Isn’t that neat?  Thanks RStudio team  =D

Hope you find this helpful. Happy Coding

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s