Which type of visualization is best when you want to compare proportions in a large volume of data
Show
Blog 13 Powerful Ways to Visualize Your Data (with Examples)By Sisense Team See Sisense in action Sign up for a Free Trial to build visual, interactive experiences. Work Email * Full Name * Company * Phone Number * By checking this box, I agree that my contact details may be used by Sisense and its affiliates to send me news about Sisense’s products and services and other marketing communications. By clicking the Submit button below I confirm that I have read and understand Sisense's Privacy Policy and Terms of Service. Get the latest in analytics right in your inbox. Work Email * Full Name * Company * Phone Number * By checking this box, I agree that my contact details may be used by Sisense and its affiliates to send me news about Sisense’s products and services and other marketing communications. By clicking the Submit button below I confirm that I have read and understand Sisense's Privacy Policy and Terms of Service. Three Key Charts for Visualizing Proportion DataProportion data examplesWhatever your application of data analytics & data science, there are proportions everywhere. Proportions are all about understanding the different parts that make up a whole. Proportions are pretty much just a count of something across a given categorical variable. That could be the number of customers across different industries, the number of sales calls in different geographies, the number of activities across various activity types, or the number of ice cream cones sold of various flavors. If you can count it, and break it into groups, then you’ve got proportion data! Basic proportion visualizationWhether you’re familiar with the idea of ‘exploratory data analysis’ or not; simple plotting of basic statistics is very helpful to any analysis, especially as you are establishing a foundation of understanding that will inform your more complex analysis. Sweet visualizationsI am going to break down three visualization types for analyzing proportions that will prove very useful: Pie charts, waffle charts, and bar charts (imagine that they’re actually maple bar or candy bar charts for the sake of ‘sweets’ theme) Pie TimePitfalls of a pie chart
Redeeming qualities of using pie charts
Lets get to it!From the group up using the mtcars dataset, lets build a pie chart. First things first, install & load up ggplot2 (install.packages(‘ggplot2), then library(ggplot2) and then you’re off to the races) Quick break down of ggplot,
From here throw geom_bar() at the bottom to let you know exactly what type of chart you’d like to see. We’ll jump into the syntax, but with ggplot, you effectively create the visualization object, and then tell that object how you want to use it. First to give you a quick idea of the data; below you can see that we’re grouping by the cylinders variable and counting the number of records in each. counts <- mtcars %>%group_by(cyl) %>% summarise(n = n()) Lets throw this into a pie! ggplot(counts, aes(x = 1, y = n, fill = cyl)) +geom_col()+ coord_polar(theta = 'y') Boom! There’s your first pie chart. You’ll see that whatever categorical variable you’re grouping by goes into the color, and the count or n as I’ve written it goes into the y aesthetic. You may also notice the geom_col() command as well as coord_polar() To give an idea of the purpose of coord_polar() I’ll run this with only geom_col() ggplot(counts, aes(x = 1, y = n, fill = cyl)) +geom_col() As you can see, this is a stacked bar with the relative portions included here. Throwing on the coord_polar(theta = 'y') allows us to wrap this bar into a pie chart. A great alternative to pie? Waffles!Ok so you don’t love pie…. Waffle charts are an excellent alternative. While waffle charts are similar to pie charts, they actually encode each level, class or value of a categorical variable as a proportion of squares. pitfalls of a waffle chart
To prep your data for a waffle chart, you need to scale values to 1–100 adding up to 100. For this we’ll use dplyr (install.packages('dplyr'), library(dplyr)). What you’ll see below is that we group our dataset by our categorical, then we’ll summarise according to the counts or n(). From there, we then create a new variable called percent using mutate. The big thing here is in our mutate() function, we are creating this scaled to 100 value. We’ll set up the names for case_counts and then we’ll run waffle() count <- mtcars %>%group_by(cyl) %>% summarise(n = n()) %>% mutate(percent = round(n/sum(n)*100))case_counts <- count$percent names(case_counts) <- count$cylwaffle(case_counts) Ok we’re on our way! Lets wrap it up with bar chartsFor a lot of things, bars just work better at establishing the relative comparability value to value. Lets unroll our pie and throw it into bars. Also take note that this is not a histogram. We are treating the cylinder count as a categorical variable. library(ggplot2)ggplot(mtcars, aes(x = as.factor(cyl))) + geom_bar() best practice for stacked bars: don’t make them in isolation, it’s not nearly as useful after three the key is that the wholes being compared all share the same y axis Something to keep in mind for bars is that anything far beyond three variables will be a lot more difficult to interpret. In order to reorder the bars of your bar chart, you’ll need to make sure the categorical variable is a factor as.factor(), then change the levels into the order you want them displayed Ggplot orders the bars and legend based upon the order it sees the variables in the dataset. To override this, turn the disease column into a factor with the levels in the order we want our plot to use. mtcars %>%factor(levels = c('2', '4', '6')) This can often play a big part in organizing your plots to optimize for interpret-ability ConclusionEnjoy getting your hands dirty with proportion charts and categorical related data visualization. As you familiarize yourself with different charting techniques it will do you well to think about different charting tools as tools you might use for a given datatype and situation. Happy Data science-ing! And don’t forget to follow my blog to get more blogs related to machine learning, data visualization, data wrangling, and all things data science! datasciencelessons.com. Which type of Visualisation is best when you want to compare proportions in a large volume of data with multiple categories and subcategories?Treemaps A treemap represents data as rectangles in hierarchical form. Treemaps generally show proportions in distinct colors and sizes to enable users to more easily understand large volumes of data. This type of chart is great when data involves multiple subcategories that are difficult to analyze in bar charts.
Which chart type is best when you are comparing proportions easily?Bar charts are good for comparisons, while line charts work better for trends.
How do you visualize a proportion of data?Waffle charts are an excellent alternative. While waffle charts are similar to pie charts, they actually encode each level, class or value of a categorical variable as a proportion of squares.
Which visualization type is best for comparing data points?Bar graphs can help you compare data between different groups or to track changes over time. Bar graphs are most useful when there are big changes or to show how one group compares against other groups. The example above compares the number of customers by business role.
|