Developing a seamless, clean workflow for data analysis is harder than it sounds, especially because this is something that is almost never explicitly taught. Apparently we are all just supposed to “figure it out for ourselves”. For most of us, when we start our first few analysis projects, we basically have no idea how we are going to structure all of our files, or even what files we will need to make.
In my previous post, I introduced causal inference as a field interested in estimating the unobservable causal effects of a treatment: i.e. the difference between some measured outcome when the individual is assigned a treatment and the same outcome when the individual is not assigned the treatment. If you’d like to quickly brush up on your causal inference, the fundamental issue associated with making causal inferences, and in particular, the troubles that arise in the presence of confounding, I suggest you read my previous post on this topic.
Often in science we want to be able to quantify the effect of an action on some outcome. For example, perhaps we are interested in estimating the effect of a drug on blood pressure. While it is easy to show whether or not taking the drug is associated with an increase in blood pressure, it is surprisingly difficult to show that taking the drug actually caused an increase (or decrease) in blood pressure.