A quick guide to developing a reproducible and consistent data science workflow

When you're learning to code and perform data analysis, it can be overwhelming to figure out how to structure your projects. To help data scientists develop a reproducible and consistent workflow, I've put together a short document with some guiding advice.

Rebecca Barter

When you’re learning to code and perform data analysis, it can be overwhelming to figure out how to structure your projects. To help data scientists develop a reproducible and consistent workflow, I’ve put together a short GitHub-based document with some guiding advice: https://github.com/rlbarter/reproducibility-workflow If you’re interested in contributing or improving this document, please get in touch, or even better, submit a pull request (https://github.com/rlbarter/reproducibility-workflow)! The document as of writing is shown below.

mutate_all(), select_if(), summarise_at()... what's the deal with scoped verbs?!

What's the deal with these mutate_all(), select_if(), summarise_at(), functions? They seem so useful, but there doesn't seem to be a decent explanation of how to use them anywhere on the internet. Turns out, they're called 'scoped verbs' and hopefully this post will become one of many decent explanations of how to use them!

Rebecca Barter

A quick useful aside: Using shorthand for functions The _if() scoped variant: perform an operation on variables that satisfy a logical criteria select_if() rename_if() mutate_if() summarise_if() The _at() scoped variant: perform an operation only on variables specified by name Select helpers rename_at() mutate_at() summarise_at() The _all() scoped variant: perform an operation on all variables at once rename_all() mutate_all() summarise_all() Conclusion I often find myself wishing that I could apply the same mutate function to several columns in a data frame at once, such as convert all factors to characters, or do something to all columns that have missing values, or select all variables whose names end with _important.

Visualizing world happiness

I created an interactive D3 visualization of the annual world happiness survey and its relation to several variables related to each country's government.

Rebecca Barter

Over the new year I decided to work on my D3.js skills (rather than do the actual work that I probably should’ve done) by submitting an entry to the World Data Visualization competition. The interactive version of the fruits of my labour can be found by clicking here. Below is a static screenshot.

Essentially, the first screen shows each country’s happiness score (and are colored accordingly using the viridis color palette).