The course will enable students to grasp the fundamentals of big data, Hadoop ecosystem, NoSQL databases, and machine learning, with practical Python implementation.
Course |
Learning outcome (at course level) |
Learning and teaching strategies |
Assessment Strategies |
|
Course Code |
Course Title |
|||
24MBB423 |
Big Data And Data Analytics (Practical)
|
CO737: Formulate a problem and an abstract model to handle Big Data in business domain. CO738: Apply Big Data tool/s like Hadoop for business analytics. CO739: Develop a data store to handle massive business data using Big Data tools and generate queries. CO740: Build a machine learning model on Big Data for business problems CO741: Examine the outcomes of Big Data based machine learning models and communicate the results. CO742: Contribute effectively in course-specific interaction |
Approach in teaching: Interactive Lectures,Group Discussion, Tutorials, Case Study
Learning activitiesfor the students: Self-learning assignments, presentations |
Class test, Semester endexaminations, Quiz, Assignments, Presentation |
Digital data and its classification, characteristics of data, evolution and definition of big data. Challenges with big data, why big data, Traditional Business intelligence versus Big Data
Big Data Analytics
What is Big data analytics, why sudden hype around big data analytics, classification of analytics, top challenges facing big data, terminologies used in big data environment, Top analytics tools
Apache Hadoop, Why Hadoop, Comparison with other systems: RDBMS, Grid computing, Hadoop overview, HDFS and its ecosystems, Hadoop architecture and 2.x core components. Managing Resources and applications with Hadoop YARN (Yet Another Resource Negotiator), Understanding MapReduce Programming, Running sample MapReduce program, Executing MapReduce Applications -Word count, Tera Sort, Radix Sort.
Introduction to Hadoop Ecosystem, Pig, Hive, Sqoop, HBase.
Introduction to PIG, Execution Modes of Pig, Comparison of Pig with Databases, Pig on Hadoop
Hive: Hive Shell, Architecture, data types, Comparison with Traditional Databases, HiveQL, Tables, User Defined Functions.
Use of NoSQL, Types of NoSQL, Advantages of NoSQL. Use of No SQL in Industry, NoSQL Vendors, SQL versus NoSQL, NewSQL
Hbase: Hbase basics, Concepts, Clients, Example, Hbase Versus RDBMS.
Machine Learning using python, Python installation (Window and Ubuntu), Execution modes of Python, Executing Python programs on hadoop, Python Libraries and Tools
- Pandas for data analysis, Matplotlib for data visualization, Numpy for matrix processing, SciPy for image manipulation. Applications of Machine Learning, Implementation of machine learning in Hadoop environment
*Case studies related to entire topics are to be taught.
Seema Acharya, Subhasini Chellappan, "Big Data Analytics" Wiley 2015.
Michael Minelli, Michelle Chambers, and AmbigaDhiraj, "Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses", Wiley, 2013.
P. J. Sadalage and M. Fowler, "NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence", Addison-Wesley Professional, 2012.
Tom White, "Hadoop: The Definitive Guide", Third Edition, O'Reilley, 2012.
Eric Sammer, "Hadoop Operations", O'Reilley, 2012.
E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive", O'Reilley, 2012.
Suggested readings
Lars George, "HBase: The Definitive Guide", O'Reilley, 2011. Müller, A. C., & Guido, S. (2016). Introduction to machine learning with Python: a guide for data scientists. " O'Reilly Media, Inc.".
E resources
https://onlinecourses.nptel.ac.in/noc20_mg22/preview https://alison.com/tag/financial-analysis
Journals
●https://vciba.springeropen.com/ https://appliednetsci.springeropen.com/