Which type of visualization is best when you want to compare proportions in a large volume of data

    Blog

    13 Powerful Ways to Visualize Your Data (with Examples)

    By Sisense Team

    See Sisense in action

    Sign up for a Free Trial to build visual, interactive experiences.

    Work Email *

    Full Name *

    Company *

    Phone Number *

    By checking this box, I agree that my contact details may be used by Sisense and its affiliates to send me news about Sisense’s products and services and other marketing communications.


    By clicking the Submit button below I confirm that I have read and understand Sisense's Privacy Policy and Terms of Service.

    Which type of visualization is best when you want to compare proportions in a large volume of data

    Get the latest in analytics right in your inbox.

    Work Email *

    Full Name *

    Company *

    Phone Number *

    By checking this box, I agree that my contact details may be used by Sisense and its affiliates to send me news about Sisense’s products and services and other marketing communications.


    By clicking the Submit button below I confirm that I have read and understand Sisense's Privacy Policy and Terms of Service.

    Which type of visualization is best when you want to compare proportions in a large volume of data

    Three Key Charts for Visualizing Proportion Data

    Which type of visualization is best when you want to compare proportions in a large volume of data

    Image by Monfocus from Pixabay

    Proportion data examples

    Whatever your application of data analytics & data science, there are proportions everywhere. Proportions are all about understanding the different parts that make up a whole.

    Proportions are pretty much just a count of something across a given categorical variable. That could be the number of customers across different industries, the number of sales calls in different geographies, the number of activities across various activity types, or the number of ice cream cones sold of various flavors. If you can count it, and break it into groups, then you’ve got proportion data!

    Basic proportion visualization

    Whether you’re familiar with the idea of ‘exploratory data analysis’ or not; simple plotting of basic statistics is very helpful to any analysis, especially as you are establishing a foundation of understanding that will inform your more complex analysis.

    Sweet visualizations

    I am going to break down three visualization types for analyzing proportions that will prove very useful: Pie charts, waffle charts, and bar charts (imagine that they’re actually maple bar or candy bar charts for the sake of ‘sweets’ theme)

    Pie Time

    Pitfalls of a pie chart

    • Displaying proportions at angles and offset angles at that; can make pie charts pretty tough to interpret
    • Once you get more then 3–5 classes in a given pie, it is pretty difficult to compare relative proportion — whole purpose here…
    • Ok, let's say yes you can get an idea of the general allotment for any given level or value for your categorical variable… but we often lack precision, or a precise consideration of the disparity between any given set of values.

    Redeeming qualities of using pie charts

    • Conversely pie charts are amazing for real estate. Rather than taking up a ton of space, they are small and can include a lot of information in a small space.
    • Depending on your audience, the pie chart can be very easy for uninformed groups to quickly absorb a given idea.

    Lets get to it!

    From the group up using the mtcars dataset, lets build a pie chart.

    First things first, install & load up ggplot2 (install.packages(‘ggplot2), then library(ggplot2) and then you’re off to the races)

    Quick break down of ggplot,

    • you first include the dataframe you’re working with, in this case mtcars
    • then specify aes()-thetics… which is pretty much–where you want different variables to show up on a plot
    • The first here is x, so whatever your categorical variable, your bucket, your container, your ice cream flavor; add it there.

    From here throw geom_bar() at the bottom to let you know exactly what type of chart you’d like to see. We’ll jump into the syntax, but with ggplot, you effectively create the visualization object, and then tell that object how you want to use it.

    First to give you a quick idea of the data; below you can see that we’re grouping by the cylinders variable and counting the number of records in each.

    counts <- mtcars %>%
    group_by(cyl) %>%
    summarise(n = n())

    Which type of visualization is best when you want to compare proportions in a large volume of data

    Lets throw this into a pie!

    ggplot(counts, aes(x = 1, y = n, fill = cyl)) +
    geom_col()+
    coord_polar(theta = 'y')

    Boom! There’s your first pie chart. You’ll see that whatever categorical variable you’re grouping by goes into the color, and the count or n as I’ve written it goes into the y aesthetic.

    You may also notice the geom_col() command as well as coord_polar()

    To give an idea of the purpose of coord_polar() I’ll run this with only geom_col()

    ggplot(counts, aes(x = 1, y = n, fill = cyl)) +
    geom_col()

    As you can see, this is a stacked bar with the relative portions included here. Throwing on the coord_polar(theta = 'y') allows us to wrap this bar into a pie chart.

    A great alternative to pie? Waffles!

    Ok so you don’t love pie…. Waffle charts are an excellent alternative. While waffle charts are similar to pie charts, they actually encode each level, class or value of a categorical variable as a proportion of squares.

    pitfalls of a waffle chart

    • Similar to pie charts, waffle charts can quickly be bogged down with the inclusion of too many classes
    • Definitely don’t try to facet waffle or pie charts.. it does not lend well to making a reasonable comparison of the ‘relative proportion’ which is the whole purpose.

    To prep your data for a waffle chart, you need to scale values to 1–100 adding up to 100. For this we’ll use dplyr (install.packages('dplyr'), library(dplyr)).

    What you’ll see below is that we group our dataset by our categorical, then we’ll summarise according to the counts or n(). From there, we then create a new variable called percent using mutate. The big thing here is in our mutate() function, we are creating this scaled to 100 value.

    We’ll set up the names for case_counts and then we’ll run waffle()

    count <- mtcars %>%
    group_by(cyl) %>%
    summarise(n = n()) %>%
    mutate(percent = round(n/sum(n)*100))
    case_counts <- count$percent
    names(case_counts) <- count$cyl
    waffle(case_counts)

    Ok we’re on our way!

    Lets wrap it up with bar charts

    For a lot of things, bars just work better at establishing the relative comparability value to value. Lets unroll our pie and throw it into bars. Also take note that this is not a histogram. We are treating the cylinder count as a categorical variable.

    library(ggplot2)
    ggplot(mtcars, aes(x = as.factor(cyl))) +
    geom_bar()

    best practice for stacked bars: don’t make them in isolation, it’s not nearly as useful after three

    the key is that the wholes being compared all share the same y axis

    Something to keep in mind for bars is that anything far beyond three variables will be a lot more difficult to interpret.

    In order to reorder the bars of your bar chart, you’ll need to make sure the categorical variable is a factor as.factor(), then change the levels into the order you want them displayed

    Ggplot orders the bars and legend based upon the order it sees the variables in the dataset. To override this, turn the disease column into a factor with the levels in the order we want our plot to use.

    mtcars %>%
    factor(levels = c('2', '4', '6'))

    This can often play a big part in organizing your plots to optimize for interpret-ability

    Conclusion

    Enjoy getting your hands dirty with proportion charts and categorical related data visualization. As you familiarize yourself with different charting techniques it will do you well to think about different charting tools as tools you might use for a given datatype and situation.

    Happy Data science-ing! And don’t forget to follow my blog to get more blogs related to machine learning, data visualization, data wrangling, and all things data science! datasciencelessons.com.

    Which type of Visualisation is best when you want to compare proportions in a large volume of data with multiple categories and subcategories?

    Treemaps A treemap represents data as rectangles in hierarchical form. Treemaps generally show proportions in distinct colors and sizes to enable users to more easily understand large volumes of data. This type of chart is great when data involves multiple subcategories that are difficult to analyze in bar charts.

    Which chart type is best when you are comparing proportions easily?

    Bar charts are good for comparisons, while line charts work better for trends.

    How do you visualize a proportion of data?

    Waffle charts are an excellent alternative. While waffle charts are similar to pie charts, they actually encode each level, class or value of a categorical variable as a proportion of squares.

    Which visualization type is best for comparing data points?

    Bar graphs can help you compare data between different groups or to track changes over time. Bar graphs are most useful when there are big changes or to show how one group compares against other groups. The example above compares the number of customers by business role.