Skip to Main Content

MASH : Maths and Stats Help

Descriptive Statistics

Introduction

For the purposes of this tutorial, please download the “rstatix” package.

The data used in this example contains age, gender and job satisfaction from a sample of the general population.

 

Procedure

The easiest way to obtain descriptive statistics for a variable is to use the get_summary_stats() function from the “rstatix” package.

The first argument of the function is the data set you want to get descriptives from, the second argument is a specific variable from the data set.

get_summary_stats(GSS_R_data,age)
## # A tibble: 1 × 13
##   variable     n   min   max median    q1    q3   iqr   mad  mean    sd    se
##   <fct>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 age       1448    18    89   42.5    32    58    26  18.5  45.8  17.3 0.453
## # ℹ 1 more variable: ci <dbl>


The function outputs all the necessary descriptive statistics for both parametric and non-parametric data. It is important to note that the function skips any missing values (NA).


The simplest way to obtain descriptive statistics for separate groups is to create separate data frames for each groups, then running the function mentioned above for each separate group.

We can create separate groups by using the subset() function. For the first argument we input the data frame. The second argument of the function is a condition to select the subset by (variable == value).

In our example below, we subset the main data frame based on the variable “sex” being “Female”, then store the new data frame into an object named ‘female_data’. We do the same for the data for males. Finally, we use the get_summary_stats() function to get descriptive statistics for both groups.

female_data <- subset(GSS_R_data, sex == "Female")

male_data <- subset(GSS_R_data, sex == "Male")
get_summary_stats(female_data,age)
## # A tibble: 1 × 13
##   variable     n   min   max median    q1    q3   iqr   mad  mean    sd    se
##   <fct>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 age        814    18    89     43    32    59    27  19.3  46.1  17.5 0.614
## # ℹ 1 more variable: ci <dbl>
get_summary_stats(male_data,age)
## # A tibble: 1 × 13
##   variable     n   min   max median    q1    q3   iqr   mad  mean    sd    se
##   <fct>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 age        634    18    89     42    33    57    24  16.3  45.4  16.9 0.672
## # ℹ 1 more variable: ci <dbl>

Note: The final statistic in the output, the confidence intervals of the mean (ci), is not technically a descriptive statistic. It is considered an inferential statistic, but it is often presented alongside means and standard deviations.