8.1. Introduction

  • Probability is the study of random events.

  • Statistics is the discipline of using data samples to support claims about populations.

  • Computation is a tool that is well suited to quantitative analysis.

8.1.1. Anecdotal evidence

  • Evidence based on unpublished data and usually personal (opinion).

  • Small number of observations

  • Selection bias

    people who join a discussion might be more inclined towards a specific conclusion

  • Confirmation bias

    People who believe the claim might be more likely to contribute examples that confirm it. People who doubt the claim are more likely to cite counter examples.

  • Inaccuracy

    Personal stories, often misremembered, misrepresented, repeated inaccurately.

8.1.2. Statistical approach

  • Data collection

  • Descriptive statistics

  • Exploratory data analysis

  • Hypothesis testing

  • Estimation

Population

A group we are interested in studying [a group of people, animals, minerals etc.]

Cross sectional study

A study that collects data about a population at a particular point in time

Longitudinal study

A study that follows a population over time, collecting data from the same group repeatedly.

Respondent

A person who responds to a survey.

Cohort

A group of respondents.

Sample

The subset of population used to collect data.

Representative

A sample is representative if every member of the population has the same chance of being in the sample.

Oversampling

The technique of increasing the representation of a sub-population in order to avoid errors due to small sample sizes.

Record

A collection of information about a single person or other object of study.

Field

One of the named variables that makes up a record.

Table

A collection of records.

Raw data

Values collected and recorded with little or no checking, calculation or interpretation.

Recode

A value that is generated by calculation and other logic applied to raw data.

Summary statistic

The result of a computation that reduces a dataset to a single number (or a small set of numbers) that captures some characterisitc of the data.

Apparent effect

A measurement or summary statistic that suggests that something interesting is happening.

Statistically significant

Any apparent effect is statistically significant if it is unlikely to occur by chance.

Artifact

An apparent effect that is caused by bias, measurement error, or some other kind of error.

Change log

Last Modified

$Id: intro.rst 249 2012-08-05 06:17:57Z shailesh $