data science

Data processing with Spark in R & Python

I recently gave a talk on data processing with Apache Spark using R and Python. tl;dr - the slides and presentation can be accessed below (free registration): As noted in my previous post, Spark has become the defacto standard for big data applications and has been adopted quickly by the industry. See Cloudera’s One Platform initiative blog post by CEO Mike Olson for their commitment to Spark. In data science R had seen rapid adoption, not only because it was open source and free compared to costly SAS, but also the huge number of statistical and graphical packages provided by R for data science.

An introduction to Data Science

I presented a talk last week introducing Data Science and associated topics to some enthusiasts. Here’s a slide deck I created quickly with markdown using Swipe - a start-up building HTML5 presentation tools. The contents include: Data scientist skills Data science: enablers and barriers Big data analytics Data science lifecycle Use cases Tools and technology Project approach Machine learning Skills and roles Learning resources Here are the slides: