It’s time for statistics departments to start supporting their applied students
Statistics departments are failing their applied students. In this post, I have a lot of opinions and give two pieces of advice: statistics departments need to start supporting their applied students, and they need to hire applied faculty.
Tidymodels: tidy machine learning in R
The tidyverse’s take on machine learning is finally here. Tidymodels forms the basis of tidy machine learning, and this post provides a whirlwind tour to get you started.
5 useful R tips from rstudio::conf(2020) - tidy eval, piping, conflicts, bar charts and colors
Last week I had the pleasure of attending rstudio::conf(2020) in San Francisco. Throughout the course of the week I met many wonderful people and learnt many things. This post covers some of the little tips and tricks that I learnt throughout the conference.
Learn to purrr
Purrr is the tidyverse’s answer to apply functions for iteration. It’s one of those packages that you might have heard of, but seemed too complicated to sit down and learn. Starting with map functions, and taking you on a journey that will harness the power of the list, this post will have you purrring in no time.
Transitioning into the tidyverse (part 2)
This post walks through what base R users need to know for their transition into the tidyverse. Part 2 focuses on the more specialized R packages tidyr, purrr, readr, lubridate, forcats, etc
Using the recipes package for easy pre-processing
Having to apply the same pre-processing steps to training, testing and validation data to do some machine learning can be surprisingly frustrating. But thanks to the recipes R package, it’s now super-duper easy. Instead of having five functions and maybe hundreds of lines of code, you can preprocess multiple datasets using a single ‘recipe’ in fewer than 10 lines of code.
A quick guide to developing a reproducible and consistent data science workflow
When you’re learning to code and perform data analysis, it can be overwhelming to figure out how to structure your projects. To help data scientists develop a reproducible and consistent workflow, I’ve put together a short document with some guiding advice.
mutate_all(), select_if(), summarise_at()… what’s the deal with scoped verbs?!
What’s the deal with these mutate_all(), select_if(), summarise_at(), functions? They seem so useful, but there doesn’t seem to be a decent explanation of how to use them anywhere on the internet. Turns out, they’re called ‘scoped verbs’ and hopefully this post will become one of many decent explanations of how to use them!
Which hypothesis test should I use? A flowchart
A flowchart to decide what hypothesis test to use.
Alternatives to grouped bar charts
One of the most common chart types that is simultaneously the most difficult to read is the grouped bar chart. Fortunately, there exist several substantially more effective alternatives that convey the same information without overwhelming our visual cognition abilities.
Understanding Instrumental Variables
Instrumental variables is one of the most mystical concepts in causal inference. For some reason, most of the existing explanations are overly complicated and focus on specific nuanced aspects of generating IV estimates without really providing the intuition for why it makes sense. In this post, you will not find too many technical details, but rather a narrative introducing instruments and why they are useful.
A basic tutorial of caret: the machine learning package in R
R has a wide number of packages for machine learning (ML), which is great, but also quite frustrating since each package was designed independently and has very different syntax, inputs and outputs. Caret unifies these packages into a single package with constant syntax, saving everyone a lot of frustration and time!
Interactive visualization in R
Learn about creating interactive visualizations in R.
Docathon: A Week of Doumentation
We’re hosting a week-long docathon over at BIDS.