Spring 2025 Introduction to Data Science
Lectures: Tuesday 2:00-3:20 and Thursday 2-3:20pm in Beck Hall - Room 100 on Livingston
Instructor: Ruixiang Tang
Office Hours: Friday 3-4:00pm, Hill Center room 416
Course Overview
This course offers an introductory yet comprehensive exploration of data science, focusing on foundational techniques and tools for extracting meaningful insights from data. Designed for students new to the field, the course covers essential data science methodologies, including data preprocessing, classification, clustering, association analysis, and anomaly detection. Emphasis is placed on understanding the fundamental concepts and gaining practical skills through real-world examples.
The course progresses from the basics of classification, discussing decision trees, rule-based classifiers, nearest neighbor methods, and Naïve Bayes, to more advanced techniques such as artificial neural networks, support vector machines, and ensemble methods. Key considerations in model evaluation, including model overfitting and handling class imbalances, are addressed to ensure reliable analysis.
Grading
70% Homework
10% Middle Exam
20% Final Exam
Course Schedule (tentative)
Week#
Topic
Notes
Recommended Papers for Further Reading
Week 1
Introduction
Key topics covered in the course, including data processing, classification, association analysis, cluster analysis, and anomaly detection.
Week 2
Data
Importance of data quality and preparation in ML. Techniques for data preprocessing and transformation.
Week 3
Classification: Basic Concepts and Techniques
Basic Concepts and Decision Trees, Model Overfitting
Week 4-6
Classification: Alternative Techniques
Rule-based Classifier
Nearest Neighbor Classifiers
Naïve Bayes Classifier
Artificial Neural Networks
Support Vector Machine
Ensemble Methods
Class Imbalance Problem
Week 7
Association Analysis: Basic Concepts and Algorithms
Identifying relationships between data items.
Week 8
Middle Exam
Week 9
Association Analysis: Advanced Concepts
Techniques for complex association pattern mining.
Week 10
Cluster Analysis: Basic Concepts and Algorithms
Grouping data based on similarity.
Week 11
Cluster Analysis: Additional Issues and Algorithms
Advanced clustering techniques and challenges.
Week 12
Anomaly Detection
Methods for detecting unusual patterns or outliers in data, important for applications like fraud detection and security.
Week 13
Avoiding False Discoveries
Techniques for ensuring statistical validity and avoiding misleading results in ML research.
Week 14
Large Language Model
Large language model for Data Science, such as labeling,
data generation.
Week 15
Final Exam