SciPy - Stats

Quiz

SciPy Stats is a module within the SciPy library in Python specifically designed for statistical analysis. SciPy is a powerful library used for scientific and numerical computations and the scipy.stats module provides a wide range of statistical tools, probability distributions and functions for conducting statistical operations and analysis.

Key Features of SciPy Stats

The key features of SciPy Stats include a wide range of statistical tools and functions designed to facilitate data analysis and hypothesis testing. Following are the main features of the scipy.stats module −

Probability Distributions in SciPy Stats

The scipy.stats module provides a comprehensive set of probability distributions including continuous and discrete distributions. These distributions allow for probability calculations, data modeling and statistical analysis in Python.

Types of Probability Distributions in SciPy Stats

The scipy.stats module provides a variety of probability distributions, categorized into two main types which are mentioned as follows −

Continuous Probability Distributions: Distributions that take an infinite number of values over a continuous range.
Discrete Probability Distributions: Distributions that take specific, countable values.

The Probability Distributions can be performed with the help of functions available in scipy.stats.norm module. Following are the core functions that are applicable to most probability distributions in SciPy whether they are continuous or discrete −

S.No	Function & Description
1	scipy.stats.norm.pdf() Calculates the likelihood of a continuous random variable at a specific point.
2	scipy.stats.norm.cdf() Calculates the probability that a random variable is x.
3	scipy.stats.norm.ppf() Returns the value corresponding to a given cumulative probability.
4	scipy.stats.norm.sf() Returns the probability that a random variable is > x (1 - CDF).
5	scipy.stats.norm.isf() Returns the value corresponding to a given tail probability (1 - CDF).
6	scipy.stats.norm.rvs() Generates random samples from the normal distribution.
7	scipy.stats.norm.fit() Estimates the mean and standard deviation of the given data.
8	scipy.stats.norm.mean() Returns the theoretical mean of the normal distribution.
9	scipy.stats.norm.var() Returns the theoretical variance of the normal distribution.

Statistical Tests (Hypothesis Testing)

Statistical Tests (Hypothesis Testing) refer to the process of making inferences or decisions about a population based on sample data. The core concept involves comparing two hypotheses such as the null hypothesis (H) which suggests no effect or difference and the alternative hypothesis (H) which suggests that there is an effect or difference.

Based on the data, a statistical test evaluates the strength of the evidence against the null hypothesis. Below are the key functions available in scipy.stats module to perform Statistical Tests −

S.No.	Function & Description
1	scipy.stats.ttest_1samp() Performs a one-sample t-test to compare the sample mean to a known population mean.
2	scipy.stats.ttest_ind() Performs an independent two-sample t-test to compare means from two independent groups.
3	scipy.stats.ttest_rel() Performs a paired t-test to compare means from two related samples.
4	scipy.stats.chi2_contingency() Performs a Chi-square test for independence on a contingency table.
5	scipy.stats.f_oneway() Performs a one-way ANOVA test to compare means of two or more groups.
6	scipy.stats.levene() Tests for equality of variances across groups i.e., homogeneity of variance test.
7	scipy.stats.shapiro() Performs the Shapiro-Wilk test for normality of the dataset.
8	scipy.stats.ks_1samp() Performs a one-sample Kolmogorov-Smirnov test to compare a sample with a distribution.
9	scipy.stats.ks_2samp() Performs a two-sample Kolmogorov-Smirnov test to compare two independent samples.
10	scipy.stats.mannwhitneyu() Performs the Mann-Whitney U test, a non-parametric test for comparing two independent samples.
11	scipy.stats.wilcoxon() Performs the Wilcoxon signed-rank test for comparing two related samples.
12	scipy.stats.pearsonr() Computes Pearson's correlation coefficient and p-value for testing non-correlation.
13	scipy.stats.spearmanr() Computes Spearman's rank correlation coefficient.
14	scipy.stats.kruskal() Performs the Kruskal-Wallis H-test for comparing two or more independent samples.
15	scipy.stats.friedmanchisquare() Performs the Friedman test for repeated measures across multiple conditions.

Descriptive Statistics

Descriptive statistics involves techniques to summarize and present the key characteristics of a data set. It helps to interpret the data by highlighting its distribution, central tendency and variability. This branch of statistics includes various summary measures such as central tendency indicators (mean, median, mode), measures of spread (range, variance, standard deviation) and characteristics of distribution (skewness, kurtosis).

Here are the functions available in scipy.stats module which are used to perform Descriptive Statistics −

S.No	Function & Description
1	scipy.stats.tmean(data) Computes the arithmetic mean of the dataset.
2	scipy.stats.median(data) Finds the middle value of the dataset when sorted.
3	scipy.stats.mode(data) Returns the most frequently occurring value in the dataset.
4	scipy.stats.tvar(data) Calculates the variance of the dataset.
5	scipy.stats.tstd(data) Computes the standard deviation of the dataset.
6	scipy.stats.iqr(data) Computes the interquartile range (IQR) of the dataset.
7	scipy.stats.skew(data) Measures the asymmetry of the data distribution.
8	scipy.stats.kurtosis(data) Evaluates the "tailedness" of the distribution.
9	scipy.stats.scoreatpercentile(data, q) Returns the value below which a certain percentage of observations fall.
10	scipy.stats.mstats.mquantiles(data) Computes the quantiles of the dataset.
11	scipy.stats.trim_mean(data, proportiontocut) Calculates the mean after removing a proportion of the smallest and largest values.
12	scipy.stats.tmin(data) Returns the minimum and maximum values in the dataset.
12	scipy.stats.tmax(data) Returns the minimum and maximum values in the dataset.

Correlation & Regression Analysis in SciPy Stats

Correlation and regression analysis are powerful statistical methods used to examine the relationship between two or more variables. These techniques help identify patterns, assess the strength of associations and make predictions based on data.

S.No.	Function and Description
1	scipy.stats.pearsonr() Calculates the Pearson correlation coefficient and p-value for testing non-correlation.
2	scipy.stats.spearmanr() Computes the Spearman rank-order correlation coefficient.
3	scipy.stats.kendalltau() Calculates Kendalls Tau, a correlation measure for ordinal data.
4	scipy.stats.linregress() Performs simple linear regression and returns slope, intercept, and other statistics.
5	scipy.stats.pointbiserialr() Computes the point-biserial correlation coefficient for binary and continuous data.
6	scipy.stats.variation() Calculates the coefficient of variation (CV), which measures relative variability.
7	scipy.stats.ttest_ind() Performs an independent t-test to compare means of two independent samples.
8	scipy.stats.ttest_rel() Performs a paired t-test to compare means of related samples.
9	scipy.stats.f_oneway() Performs a one-way ANOVA test to compare means of multiple groups.
10	scipy.stats.chisquare() Performs the chi-square test for goodness-of-fit.
11	scipy.stats.chi2_contingency() Performs the chi-square test for independence between categorical variables.
12	scipy.stats.mannwhitneyu() Performs the Mann-Whitney U test for comparing two independent distributions.
13	scipy.stats.wilcoxon() Performs the Wilcoxon signed-rank test for paired samples.

Random Sampling in SciPy Stats

Random sampling is a fundamental technique in statistics used to select a subset of individuals from a population or dataset for analysis. SciPy provides various methods for generating random samples from different probability distributions −

S.No.	Function and Description
1	scipy.stats.uniform.rvs() Generates random samples from a uniform distribution over the interval [0, 1).
2	scipy.stats.norm.rvs() Generates random samples from a normal (Gaussian) distribution with a given mean (loc) and standard deviation (scale).
3	scipy.stats.randint.rvs() Generates random integers from a discrete uniform distribution between low (inclusive) and high (exclusive).
4	scipy.stats.binom.rvs() Generates random samples from a binomial distribution with parameters n (number of trials) and p (probability of success).
5	scipy.stats.poisson.rvs() Generates random samples from a Poisson distribution with rate parameter mu (mean number of events).
6	scipy.stats.expon.rvs() Generates random samples from an exponential distribution with a given scale parameter (inverse of rate).
7	scipy.stats.beta.rvs() Generates random samples from a Beta distribution with shape parameters a and b.
8	scipy.stats.gamma.rvs() Generates random samples from a Gamma distribution with shape and scale parameters.
9	scipy.stats.chi2.rvs() Generates random samples from a Chi-square distribution with df degrees of freedom.
10	scipy.stats.f.rvs() Generates random samples from an F-distribution with dfnum and dfden degrees of freedom.
11	scipy.stats.t.rvs() Generates random samples from a Students t-distribution with df degrees of freedom.
12	scipy.stats.weibull_min.rvs() Generates random samples from a Weibull distribution with shape parameter c.
13	scipy.stats.dirichlet.rvs() Generates random samples from a Dirichlet distribution with concentration parameter alpha.

Data Ranking & Scaling in SciPy Stats

Data ranking and scaling are important techniques in statistics to adjust the scale of data for comparison or to assess the relative positions of observations. Ranking involves ordering data, while scaling adjusts the range or distribution to standardize or normalize it.

S.No.	Function and Description
1	scipy.stats.rankdata() Ranks the values in an array, with ties receiving the average rank.
2	scipy.stats.zscore() Standardizes an array by scaling it to have zero mean and unit variance.
3	scipy.stats.mstats.rankdata() Ranks data using masked arrays by handling missing or invalid values properly.
4	scipy.stats.mstats.zscore() Standardizes masked data arrays by transforming to zero mean and unit variance.
5	scipy.stats.percentileofscore() Calculates the percentile rank of a score within a given dataset.
6	scipy.stats.trim_mean() Computes the mean of a dataset after removing a given proportion of the smallest and largest values.

Previous Quiz Next