Trying R

Try R

R is a tool for statistics and data modeling. It can be used for descriptive statistics, inferential statistics and plotting charts.

Try R course that I have completed (http://tryr.codeschool.com/) contains 7 sections listed below. Every section is broken down to several parts giving you some context of how R can be used, and you can practice on provided examples – you need to complete each part before passing to the next one.

R completion

Sections of Try R course:

  1. Using R – an introduction to R expressions, variables, and functions.This chapter covers basic expressions like numbers, strings and TRUE/FALSE values – here you can try some simple calculations, as well as check if some given statements are true or false. Next it shows how to store values into a variable that can be accessed later, and how to pass values to functions.
  2. Vectors – Grouping values into vectors, then doing arithmetic and graphs with them.
  3. Matrices – Creating and graphing two-dimensional data sets.In this sections I’ve basically learnt how to work data stored in rows and columns – how to create, access and set values, and plot matrices.When it comes to plotting – R provides powerful visualizations for matrix data.
  4. Summary Statistics – Calculating and plotting some basic statistics: mean, median, and standard deviation.
  5. Factors – Creating and plotting categorized data.
  6. Data Frames – Organizing values into data frames, loading frames from files and merging them.Working With Real-World Data – Testing for correlation between data sets, linear models and installing additional packages.
  7. Working With Real-World Data – Testing for correlation between data sets, linear models and installing additional packages.

I have to say, while the Try R course was very interesting and gave me some idea how powerful it can be, trying it myself wasn’t that easy as it seemed. The course is smoothly guiding you through some exercises, if you want to do something different, you need to practice a bit more.

Now Try R Studio

I worked on a dataset downloaded from https://stats.oecd.org containing monthly Comparative Price Levels (CPL) for OECD countries. I downloaded it as csv file. Data show the ratios of PPPs (Purchasing power parity) for private final consumption expenditure to exchange rates. They provide measures of differences in price levels between countries, showing the number of specified monetary units needed in each of the countries listed to buy the same representative basket of consumer goods and services. In each case the representative basket costs a hundred units in the country whose currency is specified.

After downloading the file I had to do a little bit of cleansing in it, deleting the first and last rows that contained unnecessary text. I also needed to remove blank columns.

Once cleansed I imported the file to R using the following statement:

File_name <- read.csv(“C:/location/File_name.csv”, row.names=1) read.csv(“File_Name”)

I decided to visualise price levels in Ireland comparing to other countries, starting from a simple plot:

plot(PPP$Country,PPP$Ireland.EUR)

Rplotbasic

The x axis shows a country, y axis shows purchasing power.

In order for the visualisation to be more user friendly I added labels and title as follows:

plot(PPP$Country,PPP$Ireland.EUR, xlab=”Country”, ylab=”Purchasing Power”, Main=”Comparative Price Levels Ireland 2016″, las=2)

Rplotbasic2

The “las=2” refers to label style and changes the y axis label to perpendicular. This way it is more accurate as it fits in all the labels.

To visualise we can also create a barplot, which will produce better view of the data.

barplot(PPP[,16],names.arg=PPP[,1])

 

Rplot1

Again, I wanted to have better view with the labels visible for all the columns, rather than just few of them, as missing them the barplot does not give much insight, and the way it was displayed the labels just weren’t fitting in.

barplot(PPP[,16], main=”Comparative price levels Ireland Feb 2016″,xlab=”Country”, ylab=”Purchasing power”,names.arg=PPP[,1],col=”blue”, las=2)

Rplot3

So the barplot shows the cost of purchasing the same basket (worth 100 monetary units in Ireland) of consumer goods and services per country. We can see clearly that Switzerland is much more expensive and for the goods worth 100 euro in Ireland we would pay over 130 euro there, while same would cost around half the money in few other countries, like Hungary, Mexico, Poland, or Turkey.

ggplot2 Visualisation package

I have also installed ggplot2 visualisation package and used a dataset I downloaded from Met.ie website. I created a graph showing rainfall in Ireland in 2015 using qplot function

qplot(month, rain, data=weather, col=rain, main=”Rainfall in Ireland 2015″)

Qplot Rainfall

I played with it and created a line chart with points using ggplot function

ggplot(weather,aes(x=month,y=rain)) + geom_line(size=1,color=”blue”) + geom_point(size=3,color=”blue”) + ggtitle(“Rainfall in Ireland in 2015″) + geom_area(fill=”Blue”, alpha=.2)

GGplotRain

Same data can be visualised using barplot function:

barplot(weather[,6], main=”Rain in Ireland 2015″,xlab=”Month”, ylab=”Rainfall”,names.arg=weather[,1],col=”darkblue”)

Rplot rain 2015 darkblue

We can of course visualise other data from this dataset, the below shows the max temperatures in Ireland in 2015.

barplot(weather[,2], main=”Max Temp in Ireland 2015″,xlab=”Month”, ylab=”Temperature”,names.arg=weather[,1],col=”red”)

Rplot max temp 2015

 

Leave a Reply

Your email address will not be published. Required fields are marked *