Which type of visualization is best when you want to compare proportions in a large volume of data

    Blog

    13 Powerful Ways to Visualize Your Data [with Examples]

    By Sisense Team

    See Sisense in action

    Sign up for a Free Trial to build visual, interactive experiences.

    Work Email *

    Full Name *

    Company *

    Phone Number *

    By checking this box, I agree that my contact details may be used by Sisense and its affiliates to send me news about Sisense’s products and services and other marketing communications.


    By clicking the Submit button below I confirm that I have read and understand Sisense's Privacy Policy and Terms of Service.

    Get the latest in analytics right in your inbox.

    Work Email *

    Full Name *

    Company *

    Phone Number *

    By checking this box, I agree that my contact details may be used by Sisense and its affiliates to send me news about Sisense’s products and services and other marketing communications.


    By clicking the Submit button below I confirm that I have read and understand Sisense's Privacy Policy and Terms of Service.

    Three Key Charts for Visualizing Proportion Data

    Image by Monfocus from Pixabay

    Proportion data examples

    Whatever your application of data analytics & data science, there are proportions everywhere. Proportions are all about understanding the different parts that make up a whole.

    Proportions are pretty much just a count of something across a given categorical variable. That could be the number of customers across different industries, the number of sales calls in different geographies, the number of activities across various activity types, or the number of ice cream cones sold of various flavors. If you can count it, and break it into groups, then you’ve got proportion data!

    Basic proportion visualization

    Whether you’re familiar with the idea of ‘exploratory data analysis’ or not; simple plotting of basic statistics is very helpful to any analysis, especially as you are establishing a foundation of understanding that will inform your more complex analysis.

    Sweet visualizations

    I am going to break down three visualization types for analyzing proportions that will prove very useful: Pie charts, waffle charts, and bar charts [imagine that they’re actually maple bar or candy bar charts for the sake of ‘sweets’ theme]

    Pie Time

    Pitfalls of a pie chart

    • Displaying proportions at angles and offset angles at that; can make pie charts pretty tough to interpret
    • Once you get more then 3–5 classes in a given pie, it is pretty difficult to compare relative proportion — whole purpose here…
    • Ok, let's say yes you can get an idea of the general allotment for any given level or value for your categorical variable… but we often lack precision, or a precise consideration of the disparity between any given set of values.

    Redeeming qualities of using pie charts

    • Conversely pie charts are amazing for real estate. Rather than taking up a ton of space, they are small and can include a lot of information in a small space.
    • Depending on your audience, the pie chart can be very easy for uninformed groups to quickly absorb a given idea.

    Lets get to it!

    From the group up using the mtcars dataset, lets build a pie chart.

    First things first, install & load up ggplot2 [install.packages[‘ggplot2], then library[ggplot2] and then you’re off to the races]

    Quick break down of ggplot,

    • you first include the dataframe you’re working with, in this case mtcars
    • then specify aes[]-thetics… which is pretty much–where you want different variables to show up on a plot
    • The first here is x, so whatever your categorical variable, your bucket, your container, your ice cream flavor; add it there.

    From here throw geom_bar[] at the bottom to let you know exactly what type of chart you’d like to see. We’ll jump into the syntax, but with ggplot, you effectively create the visualization object, and then tell that object how you want to use it.

    First to give you a quick idea of the data; below you can see that we’re grouping by the cylinders variable and counting the number of records in each.

    counts %
    group_by[cyl] %>%
    summarise[n = n[]]

    Lets throw this into a pie!

    ggplot[counts, aes[x = 1, y = n, fill = cyl]] +
    geom_col[]+
    coord_polar[theta = 'y']

    Boom! There’s your first pie chart. You’ll see that whatever categorical variable you’re grouping by goes into the color, and the count or n as I’ve written it goes into the y aesthetic.

    You may also notice the geom_col[] command as well as coord_polar[]

    To give an idea of the purpose of coord_polar[] I’ll run this with only geom_col[]

    ggplot[counts, aes[x = 1, y = n, fill = cyl]] +
    geom_col[]

    As you can see, this is a stacked bar with the relative portions included here. Throwing on the coord_polar[theta = 'y'] allows us to wrap this bar into a pie chart.

    A great alternative to pie? Waffles!

    Ok so you don’t love pie…. Waffle charts are an excellent alternative. While waffle charts are similar to pie charts, they actually encode each level, class or value of a categorical variable as a proportion of squares.

    pitfalls of a waffle chart

    • Similar to pie charts, waffle charts can quickly be bogged down with the inclusion of too many classes
    • Definitely don’t try to facet waffle or pie charts.. it does not lend well to making a reasonable comparison of the ‘relative proportion’ which is the whole purpose.

    To prep your data for a waffle chart, you need to scale values to 1–100 adding up to 100. For this we’ll use dplyr [install.packages['dplyr'], library[dplyr]].

    What you’ll see below is that we group our dataset by our categorical, then we’ll summarise according to the counts or n[]. From there, we then create a new variable called percent using mutate. The big thing here is in our mutate[] function, we are creating this scaled to 100 value.

    We’ll set up the names for case_counts and then we’ll run waffle[]

    count %
    group_by[cyl] %>%
    summarise[n = n[]] %>%
    mutate[percent = round[n/sum[n]*100]]
    case_counts

    Bài Viết Liên Quan

    Chủ Đề