I graduated with a PhD from UC Berkeley’s statistics department in December. My PhD dissertation consisted of three 100% applied projects (one of which was a piece of open-source software). This is, unfortunately, incredibly rare. Over the past few years, I’ve had a number of current and prospective statistics PhD students both at Berkeley and outside Berkeley get in touch with me to ask me how I made my way through a statistics PhD by working only on applied projects.
Select helpers: selecting columns to apply the function to Using in-line functions with across A mutate example A select example I often find that I want to use a dplyr function on multiple columns at once. For instance, perhaps I want to scale all of the numeric variables at once using a mutate function, or I want to provide the same summary for three of my variables.
What is tidymodels Getting set up Split into train/test Define a recipe Specify the model Put it all together in a workflow Tune the parameters Finalize the workflow Evaluate the model on the test set Fitting and using your final model Variable importance There’s a new modeling pipeline in town: tidymodels. Over the past few years, tidymodels has been gradually emerging as the tidyverse’s machine learning toolkit.