cm012 - February 15, 2017


  • Discuss the need for distributed computing
  • Illustrate the split-apply-combine analytical pattern
  • Define parallel processing
  • Define SQL
  • Demonstrate how to access local and remote SQL databases
  • Introduce Hadoop and Spark as distributed computing platforms
  • Introduce the sparklyr package
  • Demonstrate how to use sparklyr for machine learning using the Titanic data set

To do for Monday

This work is licensed under the CC BY-NC 4.0 Creative Commons License.