Data analysis is integral to majority of PhD students as this enables them to interpret, understand, confirm hypotheses and develop recommendations relevant to their field. Depending on your discipline, you will be trained in a particular software. As you move towards the end of your PhD you might identify your data analysis and data management skills as transferable skills that will apply to a range of different industries. Prior to getting here, or maybe even in your last year of your PhD, you may want to brush up on your knowledge or learn a new software. The most widely used pieces of software, or ‘languages’ are typically R and Python. These are the most popular because they’re open source, i.e. they’re free which means anyone can use them. Additionally, by being open source, more people are proficient in using them, which means that it’s a lot easier to troubleshoot answers or queries to particular problems you might be having. These languages essentially allow you to import, clean, manage, manipulate, analyse and visualise data in arguably any way imaginable.
What is Python and R?
Both Python and R are open source programming languages that use particular functions and arguments to play around with almost any data file. Typically, you can import a excel file as this is usually the preferred way to store and collect data, but they both do well with managing other types such as csv files, text files, or files from other software such as SPSS. Once imported, you can do all the data churning you need! Both these programmes run on Windows, Mac, Unix and Linux operating systems which is again a benefit as to why these programmes are so popular. Often publications or reports use Python or R to generate pretty images or graphs to communicate data.
However, Python and R are not the same. The key difference here is the syntax differences in how they operate. Syntax can be thought of as the ‘grammar’ to use when writing code. Most things will be written in English, but how they’re abbreviated, where full stop’s and comma’s go will differ in a way that makes them unique. They also have different functions when trying to manipulate your data. For instance, trying to produce a bar chart in either of these languages will require a different command, with different grammatical syntaxes. This is why learning them takes time. Once you’ve mastered one language you cannot simply jump to another one and know what to do. However, learning one language makes learning a second one inherently easier. It would be like learning Portuguese after learning Spanish – they’re not the same, but having the basic knowledge of key terms, how basic grammar works and how to troubleshoot will be key building blocks when learning a second language.
Which one should I learn?
But the key question comes down to, which one should I learn? The short answer: Python. In my perspective, Python is the preferred choice simply because it’s other people’s preferred choice. Industry often gravitate towards Python or list knowing Python as a fundamental requirement on a job specification as opposed to R. Sometimes knowing both Python and R are listed together. However, in my experience seeing Python listed on its own is more common than seeing R listed on its own. Python also has a larger community in comparison to R, which again allows the software to improve at a quicker rate and makes it easier when troubleshooting problems.
How do I learn them?
As these are open source, you can just download them and play around with them! However, due to the complex nature of programming I highly recommend engaging in more formal or structured learning. If you dig around enough, I’m sure you’ll be able to find courses delivered by your university that teach Python and R. These are a great starting point and can give you a ‘flavour’ of how to use such programmes. After doing this, I recommend diving into an online training course. Personally, I was able to get a subscription to DataCamp through my university for free and can’t recommend it enough. However, other e-learning platforms such as DataQuest, PluralSight and even YouTube are great places to start.
My only advice is to stick at it, be patient and understand that it takes time to truly master these languages. For me, I found that I had only developed a reasonable amount of knowledge after committing over 100 hours to learning. For some people it might take longer, especially if you’re lacking the theoretical knowledge of statistics which underpin a lot of the operations performed in these languages. Once developing a reasonable level of skill with one language you can start to learn a second, and if all goes well, you can begin applying for jobs in industry that need these skills. Data scientist and data analyst roles are a great place to start!
Donate to show your support:
Make sure you never miss a new post!