8.2. Descriptive Statistics¶
8.2.1. Mean¶
Sample mean is defined as:
Mean of a sample is the summary statistic computed with above formula.
Mean is one way to describe the central tendency of data.
Average is one of many summary statistics one might choose to describe the typical value or the central tendency of a sample.
8.2.2. Variance¶
Variance of a sample is defined by:
\(x_i - \mu\) is called deviation from mean.
Square root of variance (\(\sigma\)) is called standard deviation.
8.2.3. Distribution¶
Summary statistics are concise but dangerous.
Histogram is a graph which shows the frequency or probability of each value.
Probability in this context is a frequency expressed as a fraction of the sample size.
Process of converting frequency to probability is called normalization.
Normalized histogram is called PMF or Probability Mass Function.
The most common value in a distribution is called its mode.
Mode is also a summary statistic. In certain cases, mode does a very good job of describing the typical value.
Outliers are the values which are far away from central tendency.
It is difficult to compare two histograms.
8.2.4. Outliers¶
Outliers are values far away from central tendency.
8.2.5. Relative Risk¶
Relative risk is a ratio of two probabilities.
Example
Probability that a first baby is born early is 18.2%.
Probability that other babies are born early is 16.8%.
Relative risk is 1.08%.
First babies are about 8% more likely to be early.
8.2.6. Conditional Probability¶
Conditional probability is a probability which depends on some condition.
- Central tendency¶
A characteristic of a sample or population; intuitively, it is the most average value.
- Spread¶
A characteristic of a sample or population; intuitively it describes how much variability there is.
- Variance¶
A summary statistic often used to quantify spread.
- Standard deviation¶
The square root of variance, also used as a measure of spread.
- Frequency¶
The number of times a value appears in a sample.
- Histogram¶
A mapping from values to frequencies or a graph that shows this mapping.
- Probability¶
A frequency expressed as a fraction of the sample size.
- Normalization¶
The process of dividing a frequency by a sample size to get a probability.
- Distribution¶
A summary of the values that appear in a sample and the frequency, or probability of each.
- PMF¶
Probability mass function: a representation of a distribution as a function that maps from values to probabilities.
- Mode¶
Most frequent value in a sample.
- Outlier¶
A value far from the central tendency.
- Trim¶
To remove outliers from a dataset.
- Bin¶
A range used to group nearby values.
- Relative Risk¶
A ratio of two probabilities, often used to measure a difference between distributions
- Conditional probability¶
A probability computed under the assumption that some condition holds.
- Clinically significant¶
A result, a difference between groups, that is relevant in practice.
8.2.7. Reference¶
Change log
- Last Modified
$Id: descriptive.rst 249 2012-08-05 06:17:57Z shailesh $