20773: Analyzing Big Data with Microsoft R Training & Certification Course
Overview
The main purpose of the course is to give students the ability to use Microsoft R Server to create and run an analysis on a large dataset, and show how to utilize it in Big Data environments, such as a Hadoop or Spark cluster, or a SQL Server database.
Audience profile
The primary audience for this course is people who wish to analyze large datasets within a big data environment.
The secondary audience is developers who need to integrate R analyses into their solutions.
After completing this course, students will be able to:
- Explain how Microsoft R Server and Microsoft R Client work
- Use R Client with R Server to explore big data held in different data stores
- Visualize data by using graphs and plots
- Transform and clean big data sets
- Implement options for splitting analysis jobs into parallel tasks
- Build and evaluate regression models generated from big data
- Create, score, and deploy partitioning models generated from big data
- Use R in the SQL Server and Hadoop environments
Prerequisite
- Programming experience using R, and familiarity with common R packages, Knowledge of common statistical methods and data analysis best practices
- Basic knowledge of the Microsoft Windows operating system and its core functionality.
Full Description
Module 1 - MICROSOFT R SERVER AND R CLIENT
- What is Microsoft R server
- Using Microsoft R client
- The ScaleR functions
- Lab: Exploring Microsoft R Server and Microsoft R Client
Module 2 - EXPLORING BIG DATA
- Understanding ScaleR data sources
- Reading data into an XDF object
- Summarizing data in an XDF object
- Lab: Exploring Big Data
Module 3 - VISUALIZING BIG DATA
- Visualizing In-memory data
- Visualizing big data
- Lab: Visualizing data
Module 4 - PROCESSING BIG DATA
- Transforming Big Data
- Managing datasets
- Lab: Processing big data
Module 5 - PARALLELIZING ANALYSIS OPERATIONS
- Using the RxLocalParallel compute context with rx exec
- Using the revoPemaR package
- Lab: Using rxExec and RevoPemaR to parallelize operations
Module 6 - CREATING AND EVALUATING REGRESSION MODELS
- Clustering Big Data
- Generating regression models and making predictions
- Lab: Creating a linear regression model
Module 7 - CREATING AND EVALUATING PARTITIONING MODELS
- Creating partitioning models based on decision trees.
- Test partitioning models by making and comparing predictions
- Lab: Creating and evaluating partitioning models
Module 8 - PROCESSING BIG DATA IN SQL SERVER AND HADOOP
- Using R in SQL Server
- Using Hadoop Map/Reduce
- Using Hadoop Spark
- Lab: Processing big data in SQL Server and Hadoop
Fees & Schedule
Delivery Mode | Course Duration | Fees |
---|---|---|
Live Virtual Training | 3 Days | Ask for Quote |
Onsite Classroom Training | 3 Days | Ask for Quote |
Customized Training | 3 Days | Ask for Quote |