For the purposes of this tutorial, please download the “rstatix” package.
The data used in this example contains age, gender and job satisfaction from a sample of the general population.
The easiest way to obtain descriptive statistics for a variable is to use the get_summary_stats() function from the “rstatix” package.
The first argument of the function is the data set you want to get descriptives from, the second argument is a specific variable from the data set.
get_summary_stats(GSS_R_data,age)
## # A tibble: 1 × 13
## variable n min max median q1 q3 iqr mad mean sd se
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 age 1448 18 89 42.5 32 58 26 18.5 45.8 17.3 0.453
## # ℹ 1 more variable: ci <dbl>
The function outputs all the necessary descriptive statistics for both parametric and non-parametric data. It is important to note that the function skips any missing values (NA).
The simplest way to obtain descriptive statistics for separate groups is to create separate data frames for each groups, then running the function mentioned above for each separate group.
We can create separate groups by using the subset() function. For the first argument we input the data frame. The second argument of the function is a condition to select the subset by (variable == value).
In our example below, we subset the main data frame based on the variable “sex” being “Female”, then store the new data frame into an object named ‘female_data’. We do the same for the data for males. Finally, we use the get_summary_stats() function to get descriptive statistics for both groups.
female_data <- subset(GSS_R_data, sex == "Female")
male_data <- subset(GSS_R_data, sex == "Male")
get_summary_stats(female_data,age)
## # A tibble: 1 × 13
## variable n min max median q1 q3 iqr mad mean sd se
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 age 814 18 89 43 32 59 27 19.3 46.1 17.5 0.614
## # ℹ 1 more variable: ci <dbl>
get_summary_stats(male_data,age)
## # A tibble: 1 × 13
## variable n min max median q1 q3 iqr mad mean sd se
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 age 634 18 89 42 33 57 24 16.3 45.4 16.9 0.672
## # ℹ 1 more variable: ci <dbl>
Note: The final statistic in the output, the confidence intervals of the mean (ci), is not technically a descriptive statistic. It is considered an inferential statistic, but it is often presented alongside means and standard deviations.