# MASH : Maths and Stats Help

### A-Z of Maths, Stats, and Numeracy Terminology

Here you'll find definitions and explanations of many mathematical terms, and statistical concepts and tests. You'll also find links to further information about them as well as the 'how-to' guides to accompany them.

This will forever be a work in progress, so please let us know if you would like something adding to this.

### A

Accuracy: How close measured scores in a dataset are to their true values. The true value may be decided based on background literature or previous studies.

Algebra: The area of mathematics where letters and other general symbols are used to represent numbers and quantities in formulae and equations.

Alternate-form reliability: A method of measuring reliability which employs a functionally similar version of a survey or instrument alongside the original version. In this way, it works to counteract the practice effect which hinders test-retest reliability. Examples of such variations include changing the ordering or the wording of some questions in the survey/instrument.

Alternative hypothesis: A hypothesis which predicts that there will be a significant result on a statistical test. It contradicts the null hypothesis. It is sometimes known as H1

Analysis of covariance (ANCOVA): An ANOVA (see ANOVA) that accounts for the effect of an added confounding variable, referred to as the covariate, when considering the relationship between the factor(s) and the dependent variable.  [SPSS Guide]

Analysis of variance (ANOVA):  An analysis of the difference among means where one or more independent variables (referred to as factors) are used to group the data. The data captured for each group is then compared with the others to observe any statistically significant differences between them. This method is best used when a factor has three levels, i.e. when it divides the data into three or more distinct groups.

Argument: A value or term that is provided to a function, for example in R, to help execute the function.

### B

Bartlett’s test for homogeneity of variance: A test used to determine whether the variances between all samples in an analysis are roughly equal. It tests the null hypothesis that they are equal, so a significant difference indicates unequal variances. It is best applied when the data is suspected to be roughly normally distributed.

Between-Subjects ANOVA: A One-Way Between-Subjects ANOVA is a parametric hypothesis test that compares the difference between more than two independent groups, such as comparing the difference between three distinct groups. [SPSS Guide]

Bias: A general term referring to systematic (i.e. not random) deviation of an estimate from the true value.

Bimodal distribution: A distribution curve where two ‘peaks’ are observed.

Binomial Test: A test that compares the difference in the probability between two outcomes. These outcomes can be assumed to be equally likely (probability=0.5) or not (e.g., rolling a 1 on a die; probability = 0.167).

Bonferroni correction: A post hoc test used in procedures such as ANOVAs to adjust the p-value of each comparison between groups. This is performed to reduce the probability of a type 1 error.

Breakeven Point: A business term used to denote the quantity at which total costs are equal to total revenue. At this quantity, the profit is zero and the business is said to 'break even'.

### C

Calculus: The area of mathematics involving derivatives and integrals, it is the study of motion in which changing values are studied.

Central limit theorem: A theorem which states that the sampling distribution of the mean draws closer to being normally distributed as the gathered sample size increases.

Chain Rule: A rule of differentiation that shows the products of two split derivitatves is equal to the full derivative.

$${dy \over dx} = {dy \over dt} \cdot {dt \over dx}$$

Chi-square test (χ²) – A statistical test for goodness-of-fit which tests the null hypothesis that the distribution of a discrete variable coincides with the distribution of the data.  [SPSS Guide]

Cohen’s d: A measurement of effect size used to determine the size of a difference between means, such as in the case of a t-test.  Cohen’s d values equal to or greater than 0.2 represent a small effect, values equal to or greater than 0.5 represent a medium effect, and values equal to or larger than 0.8 represent a large effect.

Collinearity: Where, in a regression analysis, a strong correlation exists between two variables such that it is difficult to estimate their individual regression coefficients reliably.

Compliment: A term used in set theory and probability to denote the exact inverse of a set or probability. For example, the complement of odd numbers is even numbers.

Convergent: A type of integral where the associated limits both exist and are finite.

Confidence intervals: A measurement of the range in which a researcher anticipates their observed values to fall within if their experiment were to be repeated. Confidence intervals often bracket a sample estimate on either side of its distribution, with the most common bracket being 95%.

Confounding variable: A variable, other than the independent variable(s) in a statistical test, which has a relationship with the dependent variable(s) that distorts the original IV-DV relationship.

Correlation: – A trend in the relationship between two variables where a change in one variable is associated with a change in another. A correlation does not necessarily mean that one variable directly causes a change in the other (causation).

Correlation coefficient: A measurement of the degree of correlation between two variables. It is a value between –1 and +1, representing a negative correlation if it is below 0, and a positive correlation if it is above 0. Examples of correlation coefficients include r (from Pearson’s r) and ρ (from Spearman’s Rank).

Corollary: A proposition that follows from (and is often appended to) a mathematical proof.

Cosine Rule: A common equation in trigonometry, which relates the lengths of the sides of a triangle to the cosine of one of its angles.

$$c^2 = a^2 +b^2 -2 ab \ cos\ (\gamma)$$ $$b^2 = a^2 +c^2 -2ac \ cos\ (\beta)$$ $$a^2 = b^2 +c^2 -2bc \ cos\ (\alpha)$$

Cronbach’s alpha (α): A measure of the reliability or internal consistency of a multi-item scale, in a single value. The reliability coefficient derived from this test can range from 0 to 1, with the ideal score often being .70 or higher for new scales, .80 or higher for established scales.  [SPSS Guide]

### D

Dependent variable (DV):  A variable that depends on other factors, normally the manipulation of an independent variable. These variables are often measured in experiments.

Descriptive statistics: Statistics which quantitatively describe the properties of a data set, such as the mean, median, mode, standard deviation, or frequency distributions.

Differentiation: The process of finding the derivative or the rate of change of a function. It is often considered the inverse of integration.

Distribution curve: A graphical representation of the spread of scores for a continuous variable.

Divergent: A type of integral where the associated limits either do not exist or are (positive or negative) infinity.

### E

Effect size: A measure of how meaningful the relationship between two variables, or the difference between groups, is. Different tests are used to measure effect size for different statistical tests, such as Cohen’s d and partial eta squared (ηp2). This is often interpreted in terms such as ‘small’, ‘medium’ and ‘large’.

Expectation size: The mean of a sampling distribution, calculated as the sum of each value of each event (x) multiplied by the probability of the even (P(x). The expected value of a random variable X is often denoted by E(X), E[X].

$$E[X] = \sum{xP(x)}$$

### F

Factor analysis: A method used to observe patterns in a data set by reducing it down to a set of variables for similar items in a measure (known as dimensions). Factor analysis can be performed without any preconceived ideas of the data’s structure (exploratory analysis), or to verify a specific idea that a researcher has about the data’s structure (confirmatory analysis). This method is particularly useful in studies which try to better understand psychological variables or socioeconomic status.

Fleiss' kappa: is a measure of inter-rater agreement used to determine how well two, or more, raters agree on their scoring of nominal data.

Fisher's exact test: A statistical test to determine whether or not there is a significant association between two categorical variables. Commonly used when the sample size is small and the assumptions of a Chi-square test (χ²) are violated.

Frequency distribution: A display of how often each data value occurs in the data set or specific variable. This could be shown in a table or graph.

Friedman’s ANOVA: – A non-parametric statistical test of difference, suitable for comparing between more than two related groups. The test can be seen as a non-parametric alternative of a Repeated-Measures ANOVA. [SPSS Guide]

### G

Geometry: The study of lines, angles, shapes, and their properties. Geometry studies physical shapes and object dimensions.

Graph: A diagram depicting the relationship between two or more variables. Examples of graphs include bar charts, box and whisker plots, histograms, line graphs, pie charts, and scatter plots.

### H

Heteroscedasticity: Where the variances of two variables being compared are unequal. Parametric tests make the assumption of homogeneity of variance, so the heteroscedasticity of data makes this more difficult.

Histogram: A graph depicting the distribution of a numerical variable in the data set. A histogram divides a data set into equal intervals of values (often referred to as Bins) and then records the frequency with which the values in the data set fall into each interval. This is depicted through an array of bars, with a higher bar indicating a higher frequency of values falling within that interval. Normal distribution curves are normally depicted using histograms, as it is good for showing where the data forms peaks, or becomes skewed.

Homoscedasticity: Where the variances of two variables being compared are roughly equal. This is ideal for parametric testing, where the assumption of homogeneity of variance is made.

Hypothesis – A prediction of the outcome of a statistical test. There are two types of hypotheses: the null (H1) and the alternative (H1).

### I

Independent measures/between-subjects ANOVA: An ANOVA where the data points which are separated into groups are gathered from different participants. For example, researchers may want to examine the effect of a pharmaceutical drug on people with a certain condition compared to people who don’t have that condition.  [SPSS Guide]

Independent Samples t-Test: An independent-samples t-test is a parametric hypothesis test which compares the means between two unrelated groups, such as comparing the difference between class 1 and class 2. [SPSS Guide] [R Guide] [JAMOVI Guide]

Independent Samples: A grouping condition, opposite to paired-samples, commonly used for t-tests. Subjects in one group are entirely distinct and independent from the subjects in the other groups.

Independent variable (IV): A variable whose variation doesn’t depend on that of another. These are often manipulated in experiments to measure any resulting changes in the dependent variable(s).

Inferential statistics: The practice of inferring properties of a population based on comparisons between the distributions of their data. One example of inferential statistics is hypothesis testing, which includes t-tests, correlations, ANOVAs etc.

Integration: The process of finding a function g(x) that's derivative is another function f(x). It is often considered the inverse to differentiation.

Internal consistency: A method of measuring reliability which involves measuring the reliability of the individual items of a test. This is usually performed using Cronbach’s alpha.

Inter-rater reliability: A method of measuring reliability which involves conducting the same measure multiple times, but with different people conducting it each time.

Interquartile range: A measure of dispersion. It is the difference between the 1st and 3rd quartiles in a data’s distribution. A larger interquartile range means a more dispersed distribution of data.

Interval Data: A classification or 'level' of data that comprises continuous measurements without an absolute zero. Examples include Temperature (°C or °F)

Intra-rater reliability: A method of measuring reliability which involves conducting the same measure multiple times, with the same people conducting it each time.

### K

Kendall’s tau rank test (τ): A non-parametric test of correlation between two variables, where each variable is measured as ordinal data. This test uses the test statistic τ.

Kruskal-Wallis test: A non-parametric statistical test of difference, suitable for comparing between more than two independent groups. The test is a non-parametric version of a one-way Between-Subjects ANOVA. [SPSS Guide]

Kurtosis: The steepness or flatness of a distribution curve.

### L

Levene's Test: A statistical test used to compare the homogeneity of variances between two or more groups. Commonly used prior to conducting ANOVAs or t-tests [ R Guide

Linear Regression: A statistical test that is a type of regression (see Regression) that is used to predict a linear scale variable.

### M

Mann-Whitney U-Test: A non-parametric version of the independent-samples t-test, the test compares the means between two unrelated groups, such as comparing the difference between class 1 and class 2. [SPSS Guide] [R Guide] [JAMOVI Guide]

Mean: An average of a set of numbers that express the middle or typical value from a set of numbers. The Mean is calculated as the sum of all values, divided by the number of values included.

Median: The middle score or data point in a set of ranked data points. When the number of data points is even, there is no true middle, therefore the median is the mean of the two middle values.

### N

Nominal Data: A classification or 'level' of data that comprises named but unordered categories. Examples include Gender (Male, Female, Non-Binary), Eye Colour (Blue, Green, Brown), and subject choice (Maths, Chemistry, Sociology)

### O

Ordinal Data: A classification or 'level' of data that comprises ordered categories, examples include Height (Short, Medium, Tall) Age, (Young, Middle Age, Old) and University Years (First Year, Second Year, Third Year)

Ordinal Regression: A statistical test that is a type of regression (see Regression) that is used to predict an ordinal-level variable.

### P

Paired Samples t-Test: A paired-samples t-test is a parametric hypothesis test which compares the means between two related groups, such as comparing the difference between one group of participants before and after an intervention. [SPSS Guide] [R Guide] [JAMOVI Guide]

Parametric Differentiation: The process of taking the derivative (See Differentiation) of a variable y with respect to another variable x, where both variables are defined in terms of a parameter t. E.G.,

$${dy \over dx} = {dy(t) \over dx(t)} = {{dy \over dt} \over {dx \over dt}}$$

Pearson's r Correlation: A test of relationship used for scale variables [SPSS Guide] [JAMOVI GUIDE]

Poisson: A type of distribution that expresses the probability of a given number of events occurring in a fixed interval. Commonly used with count data.

Post-Hoc: A secondary statistical test used to find the exact source of a difference once another statistical test has confirmed the existence of a difference somewhere.

Power Rule: An essential/fundamental rule of differentiation. If x is a variable and is raised to a power n, then the derivative of x raised to the power n is represented by:

$${dy \over dx} x^n= n\cdot x^{(n-1)}$$

Product Rule: An equally essential/fundamental rule of differentiation. If a function y(x) can be expressed as f(x)g(x), then the derivative of y(x) can be represented by:

$${dy(x) \over dx}={d[f(x)\cdot g(x)] \over dx}= {df(x) \over dx}\cdot g(x) + f(x)\cdot {dg(x) \over dx}$$

Proof: A logical argument, or a series of arguments, that demonstrates that a theorem is true. Proofs can range from a diagram or a few sentences up to 10,000 pages.

Pythagorean identity: A trigonometry rule expressing the Pythagorean theorem in terms of the sine and cosine functions.

$$sin^2 \theta +cos^2 \theta =1$$

Pythagorean theorem: A trigonometry rule expressing the relationship between two sides of a right-angle triangle and its hypotenuse.

$$a^2 + b^2 = c^2$$

### Q

QQ (Quantile-Quantile) Plot: A method for comparing the distribution of a sample against a theoretical distribution (commonly a normal distribution), if the sample matches the theoretical distribution the graph will be a straight (diagonal line).

### R

Range: A measure of the spread or variability in a dataset. The difference between the maximum (largest) and minimum (smallest) values.

Ratio Data: A classification or 'level' of data that comprises continuous measurements with an absolute zero. Examples include Distance (km), Weight (km), and Temperature (°K)

Repeated Measures ANOVA: A One-Way Repeated-Measures ANOVA is a parametric hypothesis test that compares the difference between more than two related groups, such as comparing the difference between three conditions that all participants experience. [SPSS Guide]

Row-Echelon Form: A type of matrix where all of the non-zero rows have a pivot (a non-zero entry such that all the entries to its left and below it are equal to zero).

### S

Scale Data: A classification of data that encompasses both Ratio and Interval level data, or continuous measurements. Examples include Height (in centimetres), Weight (in kilograms), Temperature (°C) and Test Scores (as a percentage)

Sigma Notation: The Greek letter Sigma Σ is commonly used to represent a total sum. Suppose we have n values x1, x2...xn, and we wish to add them all up

$$x_1 +x_2 + ... +x_n= \displaystyle \sum_{i=1}^n x_i$$

Sign Test: A non-parametric statistical test that compares the sign differences between two sets of matched data. The test is very similar to the non-parametric Wilcoxon test, however, the sign test ignores the size of each difference and only counts the number of differences.

Sine Rule: A common equation in trigonometry which relates the lengths of the sides of any triangle to the sines of its angle

$${sin \ \alpha \over a} = {sin \ \beta \over b} ={sin \ \gamma\over c}$$

Standard Deviation (SD or σ): A statistic that shows the spread or variability of a dataset. It measures how far the data points differ from the mean (see Mean) A large standard deviation means the data is very spread out, while a small standard deviation means the data is very close together. It is the square root of the Variance.

Sum Rule: An essential/fundamental rule of differentiation. If a function y(x) can be expressed as f(x) + g(x), then the derivative of y(x) can be represented by:

$${d[f(x) + g(x)] \over dx}= {df(x) \over dx} + {dg(x) \over dx}$$

### T

Theorem: A proposition which is not self-evident but proved by a chain of reasoning often in the form of a mathematical proof.

Trigonometry: The area of mathematics involving the relations of the sides and angles of triangles and the relevant functions of any angles.

### U

Univariate: An analysis technique that studies the sample distribution of a single variable without reference or regard to other variables.

### V

Variance: A measure of the spread or distribution of scores in a dataset. The larger the variance, the larger the individual scores are from the sample mean. It is the square of the Standard Deviation.

Varimax: A type of orthogonal rotation, commonly used in Factor Analysis, that maximises the variance of the factors

### W

Welch's Test: An inferential test of difference that is used when the data is non-homogenous.

Wilcoxon 'Signed-Ranks' Test:  A non-parametric version of the paired-samples test. This test of difference is used to compare the means between two related groups, such as comparing the difference between pre-intervention and post-intervention test results. [SPSS Guide] [R Guide]

### X

X-axis: The horizontal axis (going left to right along the page) on a 2D graph.

### Y

Y-axis: The vertical axis (going top to bottom down the page) on a 2D graph.

### Z

Z-Score: A standard score which describes the position of an original score in terms of its distance from the mean score measured in units of standard deviation.

Z-Test: An inferential statistical test used to determine whether the means between two populations are different when the group variances are known.