1 General introduction
r
is a programming language that can be used to develop tools that harvest the potential of the Internet while it can help you develop a holistic approach to your work routine.
This seminar aims to only provide a gentle introduction to some of the things that r
can do for you. After the seminar, those interested can pursue further their interests and use the advanced resources provided to dive deeper into the topics.
Its contents target developing skills that can be applied throughout the university studies as well as on the job market.
Why this seminar?
The are a couple of reasons why someone might be interested in using r
beyond data analysis.
1.0.1 Reason 1
First, r
is a programming language thus it is a gateway to other “hard-core” programming languages. For someone who wants to re-invent oneself, r
can be a useful companion during transitioning from closed-ended data analysis software like SPSS, Mplus or Stata toward a language based logic in data analysis which can be elevated to a holistic work flow.
When using data analyis software like SPSS, Mplus or Stata, one has at one’s disposal powerful tools specifically designed to cover the unique purposes of data analysis. This means however also that one needs to simultaneously master a number of other software in writing up, storing and disseminating one’s work. A typical work routine involves:
- importing a dataset in the preferred data analytical software like SPSS,
- performing the needed analyses,
- copy/ pasting the result output into a text editor like Word,
- bouncing forth and back between steps a, b, and c until final results are ready,
- saving the manuscript as a PDF copy.
Sometime, when one follows the open science practices, there are a couple of extra steps involved:
- upload the PDF copy to an online repository,
- upload scripts and material to the online repository (after ensuring a good documentation),
Meanwhile, with the help of r
one can develop a work routine wherein all the “a” through “g” steps, as well as several other, can be integrated into one work flow in time.
1.0.2 Reason 2
The second reason why one might be interested in r
beyond data analysis is the appeal of using an open source and community maintained work environment.
This usually means that a dedicated team of specialists develop and maintain software like SPSS, Mplus or Stata which most often than not is available against costs. On the other hand, r
is open source which means that the code is publically available and everyone can contribute to its development. The Cran website hosts an archive and recent developments.
1.0.3 Reason 3
Third, r
is a programming language around which a number of excelent tools have been developed. All of these tools, some of which are covered in this seminar, are likewise open source and can be used in an integrated work environment. This means that if one wants to transition to r
, one can have access to a universe of new posibilities including, for example, creating websites, web applications as well as working seamlessly and simultaneously with several other programming languages including python
and SQL
.
What else is good to know?
1.0.4 Some wizardry stuff
Well, if you know programming, it is all too easy. If you don’t, nothing makes sense.1
One of the first things that one needs to do before using r
beyond data analysis is to connect all the dots so-to-speak. r
beyond data analysis relies on an integrated work environment that includes the programming language itself r
, a work environment interface like RStudio
as well as online repositories and servers, for general purposes like GitHub
and for specific purposes like shiny-apps the shinyapp.io
.
To seamlessly write code and publish it online while ensuring that it does what it is supposed to, there has to be an open channel between all these elements – the work environment needs to be integrated, that is.
To integrate all of these things one needs:
- an account on these platforms,
- establishing a communication channel between platforms and working machine (your personal computer),
- encrypting this communication channel.
I won’t cover all the required steps into detail here. This online resource provides everything one needs.
For the sake of simplicity, which happens to be the fundamental piece for the work flow we address in this seminar, one needs to have Git
installed on one’s local machine. You might already have it, so please check it first. Note that installing Git
might take some time, so don’t be surprised if that happens.
– Install for Windows by downloading from https://gitforwindows.org/
(here).
– Install for Mac or Linux using Homebrew
. Follow the steps here https://brew.sh/
(here).
Git
is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency https://git-scm.com/.
1.0.5 GitHub
Technically, this section and the one above are difficult to tease apart. For those interested, this online resource can answer further questions and is a good starting point for an advanced workflow with r
, Git
and GitHub
.
GitHub
is an online platform that facilitates collaboration, storage and publishing of almost anything programming-related. It is an online and publically accessible repository in that anyone with an account can create repositories, upload and download codes and projects. Basically it is facebook
for programmers.
A relative of GitHub
is GitLab
which is specifically designed for internal use in institutions. If you want to publish your code online (website and online books, for example) and make it accessible to everyone in the world then you should use GitHub
. If however, you’d like to work on a project internally, only with colleagues from your institution (or other registred institutions) you should use GitLab
, which is available through your institutions. At the University of Luxembourg, there is a designated GitLab
platform.
Open an account on https://github.com/.2
After the GitHub
account is live, the next step is to open and encrypt the communication channel between your local machine and your GitHub
repository account. This step can be tricky, so take your time and equip yourself with lots of patience. All the steps can be found here. A simplified, and somewhat visual, description is provided also here3
1.0.6 Pushing, pulling, cloning and commiting
It is helpful to understand first the concepts of pushing
, pulling
, cloning
and committing
. These are verbs in the English language thus they indicate actions that one can do. These actions all are in reference to the code one writes and the current location of the code and where one wants the code to be placed.
pushing
is basically the action of uploading the written code or files from the local machine onto the online repository through the distribution control system,Git
that is.pushing
only has a resemblance to uploading because pushing a code onto an online repository means simultaneously uploading it and creating a history of the code in the project. In some cases, pushing a code also means it “activates its” functions. In chapter Chapter 4, for example, we will see that the written code on the local machine becomes a book or a website once it ispushed
onto theGitHub
online repository.pulling
is in many ways the opposite of thepushing
verb. In this case, one is downloading the code or files from the online repository on the local machine. This can come in handy when one is working with others on a common project, and, while gone, someone else has updated the project; Someone else haspushed
a code update, for example. Also, this is a useful thing when one uses variant machines or when one has deleted by mistake the project from the local machine, which can happen!cloning
is in many ways copy-pasting a repository from the onlineGitHub
server to the local machine. The outcome is a straightforward one: Cloning a repository to the local machine means also that the history, code changes and dependencies are reproduced on the local machine.committing
is exactly what you think it might mean in the English language – to commit to something or someone has a finality aspect to it, or enduring, or fixed, if you will. When writing code or changing code (or files for that matter) on the local machine, you commit it to your project when you are happy with it. This then means that the updated code is now integrated in the project, it can be traced backwards in the history of the project. The nice thing about working this way is that once a code update or file iscommitted
to the project, the project itself is updated/ modified accordingly.
1.0.7 GitHub client
Writing code and creating Rmarkdown
files on the local machine is rather straightforward. For that, one needs only an r
client, and typically RStudio
(download here, will be covered in more detail in Chapter 2) is the preferred one.
However, as soon as online repositories, collaborative work and co. become relevant, one needs to communicate with these non-local machines.
One way to do this is through line coding in git
, which can be accessed via the Terminal
in the RStudio
. This can be straightforward and eventually becomes a routine. This cheet sheet provides all the necessary git
commands.
Meanwhile, if one prefers using git
through a friendlier visual interface, then one would want a GitHub
client. GitHub Desktop can be downloaded for free and has a simple interface. Check this resource to getting started with GitHub Desktop.
Illustrative example
1.0.8 Background
Throughout Chapter 3–Chapter 5, we will use a sample of the data reported in Stanciu et al. (2017)4.
Stanciu et al. (2017) studied how people stereotyped varying social groups in terms of warmth and competence across several regions in Romania. For this seminar, we will use data from a sample of n = 100 participants selected at random from the reported data set.
The data includes the following variables:
ppn
participant number,gen
self-reported gender of participant as female (1) or male (2),age
chronological age as it was self-reported in years,res
region or residence of the participant,res_other
open ended question regarding region or residence of the participant,men_warm
participant’s stereotypeical evaluation of men in terms of warmth,men_comp
participant’s stereotypeical evaluation of men in terms of competence,wom_warm
participant’s stereotypeical evaluation of women in terms of warmth,wom_comp
participant’s stereotypeical evaluation of women in terms of competence.
Stereotypical evaluations were assessed on Likert scales with these answer options:
1 = strongly disagree, 2 = disagree, 3 = undecided, 4 = agree, 5 = strongly agree.
Data and meta-data referred throughout this short book are downloadable directly from inside this book. Navigate to the left panel of the book, and press the download icon under book title.
Likewise, all r
scripts, .Rmd
and .qmd
illustrative examples are provideed in .zip
compressed files.
1.0.9 The plan
In Chapter 3 we will use this sample to illustrate how certain steps in working with data can be automatized. We will write static text and “living” texts whereby we use r
code to populate text dynamically with information automatically retrieved directly from data.
In Chapter 4 we will see how the work from previous chapter can be integrated in a self-published book or as content for the personal website. We will focus on creating tables and graphs using the sample.
In Chapter 5 we will see how we can present results in an interactive manner using online applications. We will focus on how to create tables and graphs as well as “live” texts for the online app.
I estimate I know about 0,01 %.↩︎
Note that it is not quite clear where the data is stored on these servers. So, if you are concerned about data protection issues, be sure you do not upload sensitive information. For the sake of this seminar this is not an issue, but be warned!↩︎
This resource is also a step-by-step guide for creating a website using
r
and associated tools. This will be covered is Chapter 4 of this short book.↩︎The article can be downloaded also via Orbilu at the University of Luxembourg. See here↩︎