Changes in contraceptive method mix over time
Preamble
I was having fun playing with stacked area graphs in my last post, so I thought for this one I would extend them to a topic area that’s much more in my wheelhouse: contraception. There’s a large survey that I work with a lot - the National Survey of Family Growth (NSFG) - that tracks what contraceptive methods people use, and since it’s administered at fairly regular periods, you can use it to track shifts over time. It’s a nice one to use because if you make the denominator contraceptive users, the percent of people using each method adds up to 100. I also think the chart (as you’ll see) is a nice case study in A) how to deal with data that has a ridiculous number of categories, and B) how to graphically convey situations where you only have data at specific points in time, but want to show an estimated trend.
The Good Stuff (the code)
So I first load some packages - we’ll get into them later – but they’re essentially to help me read the data, do a bit of imputation (in this case, to estimate the trend where I don’t have data), and plot.
I’m getting the data from published NSFG reports, as opposed to running anything from the dataset directly – mainly because it’s a lot more straightforward for the purposes of this post. The data are from Table 4 of this report for the years 1982, 1995, and 2002, Table 1 of this report for 1988, and and Table 1 of this report for 2008 and 2012. I used Tabula, one of the most amazing pieces of software ever created, to extract the data tables in the PDF reports. These tables all describe current use of a method (in other words, the method the respondent said they were using during the month they were interviewed), among women aged 15-44. It doesn’t take into account whether the respondent was sexually active that month, or why they were using contraception - it’s really just a description of what methods women were using at a snapshot in time.
The data looks like this, with method names as rows, and years as columns (I’m only showing a few years here so the table will fit):
Because I want the denominator to be contraceptive users, as opposed to all women, I prorate the percent distribution of methods to add up to 100%. Normally, I might be pretty cautious about doing something like this, except that what I’m really interested in is how women’s contraceptive method mix has changed over time - not whether women are using contraception more (also, overall levels of contraceptive use actually didn’t change that drastically between 1988 and 2012). I then transpose the dataset (so that years are rows and methods are columns), because that’s how stackpoly, the command we’re going to use to make the stacked area chart, wants to see the data.
The problem is that now we still only have data points for the years with data. This would be fine if we wanted to do a set of stacked bar charts, but sometimes it’s nice to get a sense of overall trends (and stacked bar charts can be misleading too, because they force the data years to be evenly spaced - which can make some trends look much sharper than they actually are).So we need to interpolate, which is basically using the information from the data we have to guess at the data we don’t have. Often people use linear interpolation, which is essentially just drawing a straight line between points; I’m using spline interpolation here, which uses a bit of fancy (but not that fancy) math to draw curved lines between points, guessing at the correct shape of the overall curve based on the information we have (that sound is 10 million mathematicians rolling over in their graves at my garbled definition).
Codewise, though, it’s pretty simple. I basically add extra rows for the extra years, tell R that it is a time series, and then use a cool interpolation function that comes with the imputeTS package, which I loaded above. Thank god that this was written already, since I spent a day failing to code it from scratch (which I think has to do with me being dumb at coding, not it being so complicated) before I realized that someone already had.
Finally, we make the graph! I spent a lot of time futzing with the colors, because there are 13 categories of contraceptive methods (which is way too many), but I didn’t want to collapse any, because part of what’s interesting to me is how methods that were once more common (like the diaphragm) faded out over time. Instead, I tried to find similar colors for broad categories (purple for sterilization, green for long acting methods, for example), and then mess with the shading for the specific methods for each broad category. I also tried to very hard to pick colors that didn’t look like a 1992 Windows desktop screensaver.
The other thing I did was to add dotted lines for the years with actual data (well, except for the beginning and end years), to try to make clear what the precise shape of the curves is a bit of a guess. With line graphs, I’ve often seen people add points to the line that denote `actual’ values, but with a stacked area chart that starts to look crazy fast. I think this is pretty successful (and I like the way it looks, but would be curious to hear other solutions people have used.) Also to try to make it extra clear, I only added x-axis labels for years with actual data.
So without further ado, the graph! I’m not going to do any interpretation here, because I think part of the fun of this kind of dense graph is exploring it on your own – as always, leave any feedback (or errors in my code!) in the comments.