Statistical inference is concerned primarily with understanding the quality of parameter estimates.
The sampling distribution represents the distribution of the point estimates based on samples of a fixed size from a certain population. It is useful to think of a particular point estimate as being drawn from such a distribution. Understanding the concept of a sampling distribution is central to understanding statistical inference.
A sample statistic is a point estimate for a population parameter, e.
Probability concepts form the foundation for statistics.
A formal definition of probability:
The probability of an outcome is the proportion of times the outcome would occur if we observed the random process an infinite number of times.
This is a corollary of the law of large numbers:
As more observations are collected, the proportion of occurrences with a particular outcome converges to the probability of that outcome.
Disjoint (mutually exclusive) events as events that cannot both happen at the same time.
What is Statistics?
Collected observations are called data. Statistics is the study of how best to collect, analyze, and draw conclusions from data. Each observation in data is called a case. Characteristics of the case are called variables. With a matrix/table analogy, a case is a row while a variable is a column.
Statistics - Correlation (Courtesy: xkcd.com)
Types of variables:
Numerical - Can be discrete or continuous, and can take a wide range of numerical values.
There is a well known adage that if you keep doing the same thing and expect different results, that is a sure sign of idiocy.
In the BI world too, we come across several instances where people take it for granted that the ‘BI tool’ will magically generate insight and spur ‘intelligence’ rather than ‘idiocy’. Yet the very practices of reporting the same measures, or of creating reports for metrics just because they are now made available by the tool, without sparing any ‘intelligence’ into what will generate insight is a major cause of failures of BI.