Name: Data Science Overview - Renad AL Majed For Training
Price: 10.00 USD
Availability: InStock

Data Science Overview

The Data Science Overview | Technologies, Tools & Modern Roles in the Data-Driven Enterprise is an introductory level course that introduces the entire multi-disciplinary Data Science team to the many evolving and related terms, with focus on Big Data, Data Science, Predictive Analytics, Artificial Intelligence, Data Mining, Data Warehousing.The overview explores the current state of the art and science, the major components of a modern data science infrastructure, team roles and responsibilities, and level-setting realistic possible outcomes for your investment. This goal of this course is to provide students with a baseline understanding of core concepts that can serve as a platform of knowledge to follow up with more in-depth training and real-world practice.

Description

[learn_press_profile]

Objectives

This course provides a high-level view of a variety of core, current data science related technologies, strategies, skillsets, initiatives and supporting tools in common business enterprise practices. This list covers a general range of topics current to the time of course distribution. We will collaborate with your team to refine level of depth of coverage, understand areas of greater importance to your team, where you would like to add demos, etc. Students will explore:

Foundations: Grids & Virtualization; SOA, ESB / EMB, The Cloud
The Hadoop Ecosystem: HDFS; Resource Navigators, MapReduce, Spark, Distributions
Big Data, NOSQL, and ETL
ETL: Exchange, Transform, Load
Handling Data & a Survey of Useful tools
Enterprise Integration Patterns and Message Busses
Developing in Hadoop Ecosystem: R, Python, Java, Scala, Pig, and BPMN
Artificial Intelligence and Business Systems
Who’s on the Team? Evolving Roles and Functions in Data Science
Growing your Infrastructure

Outline

Foundations

Grids and Virtualization
Service-Oriented Architecture
Enterprise Service Bus
Enterprise Message Bus
The Cloud

The Hadoop Ecosystem

HDFS: Hadoop Distributed File System
Resource Negotiators: YARN, Mesos, and Spark; ZooKeeper
Hadoop Map/Reduce
Spark
Hadoop Ecosystem Distributions: Cloudera, Hortonworks, OpenSource

Big Data, NOSQL, and ETL

Big Data vs. RDBMS
NOSQL: Not Only SQL
Relational Databases: Oracle, MariaDB, DB/2, SQL Server, PostGreSQL
Key/Value Databases: JBoss Infinispan, Terracotta, Dynamo, Voldemort
Columnar Databases: Cassandra, HBase, BigTable
Document Databases: MongoDB, CouchDB/CouchBase
Graph Databases: Giraph, Neo4J, GraphX
Apache Hive
Common Data Formats
Leveraging SQL and SQL variants

ETL: Exchange, Transform, Load

Data Ingestion, Transformation, and Loading
Exporting Data
Sqoop, Flume, Informatica, and other tools

Enterprise Integration Patterns and Message Busses

Enterprise Integration Patterns: Apache Camel and Spring Integration
Enterprise Message Busses: Apache Kafka, ActiveMQ, and other tools

Developing in Hadoop Ecosystem

Languages: R, Python, Java, Scala, Pig, and BPMN
Libraries and Frameworks
Development, Testing, and Deployment

Artificial Intelligence and Business Systems

Artificial Intelligence: Myths, Legends, and Reality
The Math
Statistics
Probability
Clustering Algorithms, Mahout, MLLib, SciKit, and Madlib
Business Rule Systems: Drools, JRules, Pegasus

The Team

Agile Data Science
NOSQL Data Architects and Administrators
Developers
Grid Administrators
Business and Data Analysts
Management
Evolving your Team
Growing your Infrastructure

Audience

This introductory-level / primer course is an overview intended for Business Analysts, Data Analysts, Data Architects, DBAs, Network (Grid) Administrators, Developers or anyone else in the data science realm who need to have a baseline understanding of some of the core areas of modern Data Science technologies, practices and available tools.