Probability concepts form the foundation for statistics.
A formal definition of probability:
The probability of an outcome is the proportion of times the outcome would occur if we observed the random process an infinite number of times.
This is a corollary of the law of large numbers:
As more observations are collected, the proportion of occurrences with a particular outcome converges to the probability of that outcome.
Disjoint (mutually exclusive) events as events that cannot both happen at the same time.
The BIguru BI Blog app is now available on the Amazon AppStore!
To search and download the app, go to the Amazon AppStore and search for “Biguru BI Blog”.
To download and install, you’ll need to follow instructions for your Android smartphone, i.e. you’ll need to “enable unknown sources” as outlined by Amazon.
Once you’ve downloaded and installed it (your smartphone Anti-Virus should scan the app after installation) by accepting the defaults, you’re free to get updates on new posts from this blog!
What is Statistics?
Collected observations are called data. Statistics is the study of how best to collect, analyze, and draw conclusions from data. Each observation in data is called a case. Characteristics of the case are called variables. With a matrix/table analogy, a case is a row while a variable is a column.
Statistics - Correlation (Courtesy: xkcd.com)
Types of variables:
Numerical - Can be discrete or continuous, and can take a wide range of numerical values.
This is the fourth part of a series of posts on big data. Read the previous posts here: Part-1, Part-2 and Part-3.
With the ongoing data explosion, and the improvement in technologies able to deal with it, businesses are turning to leverage this big datafor mining insights to gain competitive advantage, reinvent business models and create new markets.
A huge amount of this “big data” volumes comes from system logs, user generated content on social media like Twitter or Facebook, sensor data and the like.
This is the 3rd part of a series of posts on Big Data. Read Part-1 (What is Big Data) and Part-2 (Hadoop).
Traditionally data warehouses have been built with relational databases as backbone. With the new challenges (3Vs) of Big Data, relational databases have been falling short of the requirements of handling
New data types (unstructured data) Extended analytic processing Throughput (TB/hour loading) with immediate query access The industry has turned to Hadoop as a disruptive solution for these very challenges.