When you’re learning to code and perform data analysis, it can be overwhelming to figure out how to structure your projects. To help data scientists develop a reproducible and consistent workflow, I’ve put together a short GitHub-based document with some guiding advice: https://github.com/rlbarter/reproducibility-workflow If you’re interested in contributing or improving this document, please get in touch, or even better, submit a pull request (https://github.com/rlbarter/reproducibility-workflow)! The document as of writing is shown below.
A quick useful aside: Using shorthand for functions The _if() scoped variant: perform an operation on variables that satisfy a logical criteria select_if() rename_if() mutate_if() summarise_if() The _at() scoped variant: perform an operation only on variables specified by name Select helpers rename_at() mutate_at() summarise_at() The _all() scoped variant: perform an operation on all variables at once rename_all() mutate_all() summarise_all() Conclusion I often find myself wishing that I could apply the same mutate function to several columns in a data frame at once, such as convert all factors to characters, or do something to all columns that have missing values, or select all variables whose names end with _important.
Over the new year I decided to work on my D3.js skills (rather than do the actual work that I probably should’ve done) by submitting an entry to the World Data Visualization competition. The interactive version of the fruits of my labour can be found by clicking here. Below is a static screenshot.
Essentially, the first screen shows each country’s happiness score (and are colored accordingly using the viridis color palette).