Chapter 1
descriptive statistics = summarize observed data and present it graphically.
Types of data
| Data | Description |
|---|---|
| Nominal Data | Categories no ordering or direction |
| Ordinal Data | ordered categories (rankings, order) |
| Interval Data | Differences between measurements but no true zero |
| Ratio Data | Differences between measurements, true zero exists |
Graphs
Dot Diagram
A line of numbers on which the observations are presented as dots equal observations are stacked.
Histogram
frequency = the count.
- choose a distribution in intervals: not too many nor too many few observations per interval.
- count the number of observations in each interval, the frequency or determine the
- Build a rectangle above each interval and choose as height either the frequency or the relative frequency.
Bar Graph
- for bar graphs the variable has to be quantitative and discrete.
Measures of Center
Mean: arithmetic averageMedian: the middle observation, the observations are arranged from small to large. if n is even then compute the mean of the middle observations.Mode: the most frequently occurring observation.
Percentiles and quartiles
- The median m is also the 50th percentile: about 50% of the observations is smaller than 50% and 50% is greater than the median m.
- The quartiles Q1, m and Q3 are the 25th, 50th and 75th percentiles they split the observations in 4 roughly equal quarters.
Box-Plot
The box plot graphs the 5-number summary of the observations
- quartiles (Q1, m, Q3)
- smallest observation.
- largest observation.
Measures of Variability
- Range: the range r = largest - smallest observation
- The inter-quartile range
- Variance: sample variance:
sample variance != population variance
- resistant for outliers: median, IQR
- non outlier resistant:
, ,
Chebyshev's rule:
The empirical rule
ony valid for bell shaped histograms
| Interval | Empirical rule | General |
|---|---|---|
68% | ||
95% | ||
99.7% |
The z-scores
For samples with mean
the z-score of an observation x is
Interpretation the distance between the value and the mean in standard deviations.
For populations with mean
Empirical Rule Applied Backwards
- 68% of observations [-1, +1] z-score
- 95% of observations [-2, +2] z-score
- 99.7% of observations [-3, +3] s-score
Skewness
normal distribution skewness = 0
| Positive Kurtosis | Symetrical Distribution | Negative Skew | |
Kurtosis
normal distribution kurtosis = 3
- Negative Kurtosis
- Normal Distribution
- Positive Kurtosis
Sample Estimators
| Measure | Population Distribution | Sample Estimate |
|---|---|---|
| Mean | ||
| Variance | ||
| Standard Deviation | ||
| Skewness | ||
| Kurtosis |
Normality Check
- Graphs: on a histogram data looks approximately normal.
- Numerically:
- Skewness coefficient: (close to 0)
- Kurtosis coefficient: (close to 3)
- Q-Q plot: no systemic deviations from the x = y line.
Exponential Distribution Check
- Graph: histogram:
- no negative values
- peak at 0
- skew right.
- Numerically
- skew (close to 2)
- kurtosis (close to 6)
- Q-Q plot: no systemic deviations from the x = y line.