More On Slopegraphs - 10 minutes read


More On Slopegraphs

About a week ago Bob Rudis created a nice blog post that I saw on my R Bloggers feed that simultaneously:

I happened to be on vacation at the time but as soon as I got back and caught up I vowed to follow up since slopegraphs have always fascinated me and I happened to write a function to make them, about a year back. I wanted to look at Bob’s post in detail after very quickly agreeing with his premise that it was a much better choice than a “dumbbell chart”. So this post is about what I learned and the adjustments I made to my own function.

This post assumes that you’ve read the earlier posts.

Let’s quickly recreate the dataset Bob created in keeping it simple using the function and choosing to make it a dataframe not a tibble.

When you look at Bob’s post there’s actually a lot of code in there to make a very nice graphic. Being extraordinarily lazy I wrote my function to get a slopegraph with the least amount of work possible. The first step, which is unavoidable if you want to make use of , though is to reshape the data into a “longer” format. We’ll use and keep the column but collapse the other two columns into a factor called and put the actual “rank” into a column called . Since “actually_read” and “say_want_covered” are now factor levels instead of column names we can use to make them much nicer built in labels when we make our plot. Voila a new dataframe called . temp <- reshape2::melt(data = thedata,

id = "topic",

variable.name = "Saydo",

value.name = "Rank")

temp$Saydo <- forcats::fct_recode(temp$Saydo,

"Actually read" = "actually_read",

"Say they want" = "say_want_covered")

temp ## topic Saydo Rank

## 1 Health care Actually read 7

## 2 Climate change Actually read 5

## 3 Education Actually read 11

## 4 Economics Actually read 6

## 5 Science Actually read 10

## 6 Technology Actually read 14

## 7 Business Actually read 13

## 8 National Security Actually read 1

## 9 Politics Actually read 2

## 10 Sports Actually read 3

## 11 Immigration Actually read 4

## 12 Arts & entertainment Actually read 8

## 13 U.S. foreign policy Actually read 9

## 14 Religion Actually read 12

## 15 Health care Say they want 1

## 16 Climate change Say they want 2

## 17 Education Say they want 3

## 18 Economics Say they want 4

## 19 Science Say they want 7

## 20 Technology Say they want 8

## 21 Business Say they want 11

## 22 National Security Say they want 5

## 23 Politics Say they want 10

## 24 Sports Say they want 14

## 25 Immigration Say they want 6

## 26 Arts & entertainment Say they want 13

## 27 U.S. foreign policy Say they want 9

## 28 Religion Say they want 12 Once we get the data in the right shape I tried to make as simple and intuitive as possible. I love working with but I will admit it can get quite complex. So to create the default plot all we need to do is: That was pretty painless wasn’t it? But clearly there’s a lot of room for tweaking! Let’s make it better!

Whole books can and have been written just on the issue of graphic design so I’m not going to try and summarize it all in one little blog post. I will however, for the impatient reader, immediately take care of a few key things: Titles, subtitles and captions are important! Don’t ignore them or give them short change. You’ll notice that since we didn’t initially specify them, placeholders appear. That’s to be shameless about making you think about them even if you eventually decide to turn them “off” (read the doco) The default is that every line is it’s own color. That’s seldom a good choice for telling a story unless the number of topics (a.k.a. ) is very small. For now let’s make them all “black” and come back to this in a bit. By default is treated as a real number so the highest values are on the top of the graph. Makes more sense here to reverse the scale and put the highest ranked “1” at he top. . If we needed or wanted to might be useful. Our second attempt looks like this: newggslopegraph(dataframe = temp,

Times = Saydo,

Measurement = Rank,

Grouping = topic,

ReverseYAxis = TRUE,

Title = "14 Topics Ranked by What Americans Read vs Want Covered",

SubTitle = "'Read' rank from Parse.ly May 2019 data.\n'Want covered' rank from Axios/SurveyMonkey poll conducted May 17-20, 2019",

Caption = "Source: Axios Alright, that’s looking a little bit better for basic layout. But it doesn’t yet tell the reader a story and focus their attention on the message we want to convey. To be honest I’m not a huge fan of adding a lot of annotations to a plot so let’s first try to catch the readers attention by using color selectively.

As the name implies slopegraphs get the reader to attend to relative differences in slope, right now our choice of “black” as the only color is marginally better than our original multicolor mess but still falls far short of conveying a message. The parameter is quite flexible. The default is suitable for a small number of topics, a single color can be the right choice on occasion, but we can also pass it a character vector of colors that is as customized as we like. For example would recycle the colors red and black to create an alternating pattern. We could even build a named list that associates a color to each of the topic areas if we desired (see the vignette for an example). But right now, that is too much effort and I’d like to handle this by algorithm not by manually entry. As a start point let’s assume we’d like to get the reader to focus on understanding which topics increase in rank, decrease in rank or stay the same. We’ll color increase as black, decreases as red and things that remain level as light gray. We can accomplish that through a series of and verbs. Each topic now has a color assigned, and it’s trivial to pass our color vector to . While we’re at it we can showcase some of the other formatting options, like changing font sizes for the labels. is important if you are likely to have datapoints close together (see the vignette for the cancer data) but in this case we can be more generous since ranks won’t overlap. newggslopegraph(dataframe = temp,

Times = Saydo,

Measurement = Rank,

Grouping = topic,

ReverseYAxis = TRUE,

DataTextSize = 3.5,

YTextSize = 4,

XTextSize = 16,

DataLabelPadding = .2,

Title = "Topic Rankings Compared Between\nWhat Americans Actually Read vs Want Covered",

SubTitle = "'Actually Read' rank from Parse.ly May 2019 data.\n'Want covered' rank from Axios/SurveyMonkey poll conducted May 17-20, 2019",

Caption = "Source: Axios Very nice looking, but I think it is still too crowded with colors. Let’s adjust our coloring to highlight only the larger rank differences. It’s a matter of personal taste but easy to adjust our little script and test, rinse and repeat until we’re happy. Let’s adjust so that changes of greater than 4 or less than 4 are highlighted and the rest are gray. Then we can run the same lines into . newggslopegraph(dataframe = temp,

Times = Saydo,

Measurement = Rank,

Grouping = topic,

ReverseYAxis = TRUE,

DataTextSize = 3.5,

YTextSize = 4,

XTextSize = 16,

DataLabelPadding = .2,

Title = "Topic Rankings Compared Between\nWhat Americans Actually Read vs Want Covered",

SubTitle = "'Actually Read' rank from Parse.ly May 2019 data.\n'Want covered' rank from Axios/SurveyMonkey poll conducted May 17-20, 2019",

Caption = "Source: Axios Personally, I think that even 7 topics may be too much, but hopefully you’re getting the point that while we’re not losing any information, we’re making it easier for the reader to focus on the big changes in the data. It’s easy to discern the pattern whether it’s answering a simple question, such as what is the number one thing they say they want to read about (Health care), or a more complex question such as which topic has the biggest disparity (Sports).

Use titles, subtitles and captions well One thing we can do to make our message clearer is make better use of the title and subtitle areas. It seems simple but is too often forgotten. While we’re at it I’ll highlight a couple of new capabilities I added to the function: The ability to choose from a select number of themes. In this case Bob Rudis theme. Control the justification of the titles and subtitles and caption. But the most important change here IMHO is simply choosing words for the title and subtitle that convey what we want to look for in the plot or think about. newggslopegraph(dataframe = temp,

Times = Saydo,

Measurement = Rank,

Grouping = topic,

ReverseYAxis = TRUE,

DataTextSize = 3.5,

YTextSize = 3.2,

XTextSize = 14,

DataLabelPadding = .2,

Title = "Americans Don't Actually Read the News They Say They Want",

SubTitle = "Many sharp differences in rankings in both directions. Hypocrisy, laziness or gratification?",

Caption = "Source: Rud.is \nMakeover by ",

LineColor = colorvect,

ThemeChoice = "ipsum",

TitleTextSize = 18,

SubTitleTextSize = 12,

SubTitleJustify = "right") The same plot in Wall Street Journal style () newggslopegraph(dataframe = temp,

Times = Saydo,

Measurement = Rank,

Grouping = topic,

ReverseYAxis = TRUE,

ReverseXAxis = TRUE,

DataTextSize = 3.5,

YTextSize = 4,

XTextSize = 13,

DataLabelPadding = .2,

Title = "Americans Don't Actually Read the News They Say They Want",

SubTitle = "Many sharp differences in rankings in both directions.\nHypocrisy or laziness or gratification?",

Caption = "Source: Rud.is \nMakeover by ",

LineColor = colorvect,

ThemeChoice = "wsj",

TitleTextSize = 15,

CaptionTextSize = 6,

SubTitleTextSize = 11,

SubTitleJustify = "right"

) I’m not actually sure I like it better at all but simply demonstrating capability

Source: R-bloggers.com

Powered by NewsAPI.org

Keywords:

BlogBlogDumbbellRecodeHealth careClimate changeReading (process)EducationEconomicsScienceTechnologyReading (process)BusinessReading (process)National securityReading (process)PoliticsReading (process)SportImmigrationReading (process)Foreign policy of the United StatesReligionHealth careJean-Baptiste SayClimate changeJean-Baptiste SayEducationJean-Baptiste SayEconomicsJean-Baptiste SayScienceJean-Baptiste SayTechnologyJean-Baptiste SayBusinessJean-Baptiste SayNational securityJean-Baptiste SayPoliticsJean-Baptiste SaySportJean-Baptiste SayImmigrationForeign policy of the United StatesJean-Baptiste SayReligionIntuitionGraphic designBlogShameless (U.S. TV series)DoCoReal numberGraph of a functionParse.lyVardarSurveyMonkeyVardarParameterCharacter (computing)Euclidean vectorColorColorVignettingAlgorithmColorBlackRedColorEuclidean vectorTypesettingFontVineyardParse.lyVardarSurveyMonkeyVardarLather, rinse, repeatParse.lyVardarSurveyMonkeyVardarInformationComplex questionSubtitle (captioning)Subtitle (captioning)Subtitle (captioning)Subtitle (captioning)Subtitle (captioning)HypocrisyThe Wall Street JournalThe Wall Street Journal