Fall 2025 Introduction to Data Science
Fall 2025 Introduction to Data Science
Image generated by DALL.E 3 with prompt "Massive Data Mining"
Course Overview
This course introduces the fundamental concepts, techniques, and applications of data mining, with an emphasis on analyzing and extracting knowledge from massive and complex datasets. Students will learn core methods including classification, association analysis, clustering, and anomaly detection, as well as strategies for evaluating and validating models to ensure reliability. The course will also address challenges unique to large-scale data, such as scalability, distributed frameworks, and avoiding false discoveries. Through lectures, readings, and hands-on exercises, students will gain both theoretical understanding and practical experience in applying data mining methods to real-world problems in science, engineering, business, and healthcare.
5 Quizzes
3 Research Papers (also research paper-based quizzes)
Midterm Exam
Final Semester Project
Week#
Title
Topics
Week 1
Introduction to Data Mining
Course Overview and Introduction
Environment Setup and Tools
Introduction to Python for Data Science
Week 2
Data and Data Preparation
Types of data, data quality, preprocessing (normalization, sampling, dimensionality reduction).
Week 3
Classification: Decision Trees
Tree construction, splitting criteria (Gini, information gain), pruning.
Week 4
Model Evaluation & Overfitting
Training vs. testing, cross-validation, bias–variance tradeoff, ROC/AUC.
Week 5
Rule-Based Classifiers & k-NN
IF-THEN rules, covering vs. decision tree rules, nearest neighbor methods.
Week 6
Naïve Bayes & Neural Networks
Bayes theorem, independence assumption, perceptrons, feed-forward NNs.
Week 7
SVM, Ensemble Methods, Class Imbalance
Large-margin classifiers, bagging, boosting, random forests, rebalancing techniques.
Week 8
Association Analysis: Basics
Market basket analysis, frequent itemset mining, Apriori, support & confidence.
Week 9
Association Analysis: Advanced
Sequential patterns, graph patterns, interestingness measures.
Week 10
Clustering: Partitional Methods
K-means, initialization, distance measures, evaluation metrics.
Week 11
Clustering: Advanced Methods
Hierarchical clustering, DBSCAN, spectral clustering, cluster validation.
Week 12
Anomaly Detection
Statistical, distance-based, density-based, clustering-based approaches.
Week 13
Avoiding False Discoveries
Hypothesis testing, multiple comparisons, p-values, reproducibility.
Week 14
Scalability & Big Data Mining
Scalability issues, distributed data mining, streaming data, wrap-up.