Spring 2025 Introduction to Data Science

Lectures: Tuesday 2:00-3:20 and Thursday 2-3:20pm in Beck Hall - Room 100 on Livingston

Instructor: Ruixiang Tang

Office Hours: Friday 3-4:00pm, Hill Center room 416

Course Overview

This course offers an introductory yet comprehensive exploration of data science, focusing on foundational techniques and tools for extracting meaningful insights from data. Designed for students new to the field, the course covers essential data science methodologies, including data preprocessing, classification, clustering, association analysis, and anomaly detection. Emphasis is placed on understanding the fundamental concepts and gaining practical skills through real-world examples.

The course progresses from the basics of classification, discussing decision trees, rule-based classifiers, nearest neighbor methods, and Naïve Bayes, to more advanced techniques such as artificial neural networks, support vector machines, and ensemble methods. Key considerations in model evaluation, including model overfitting and handling class imbalances, are addressed to ensure reliable analysis.

Grading


Course Schedule (tentative)

Week#

Topic

Notes

Recommended Papers for Further Reading

Week 1

Introduction

Key topics covered in the course, including data processing, classification, association analysis, cluster analysis, and anomaly detection.

Week 2

Data

Importance of data quality and preparation in ML. Techniques for data preprocessing and transformation.

Week 3

Classification: Basic Concepts and Techniques

Basic Concepts and Decision Trees, Model Overfitting

Week 4-6

Classification: Alternative Techniques

Rule-based Classifier

Nearest Neighbor Classifiers

Naïve Bayes Classifier

Artificial Neural Networks

Support Vector Machine

Ensemble Methods

Class Imbalance Problem

Week 7

Association Analysis: Basic Concepts and Algorithms

Identifying relationships between data items. 

Week 8

Middle Exam

Week 9

Association Analysis: Advanced Concepts

Techniques for complex association pattern mining.

Week 10

Cluster Analysis: Basic Concepts and Algorithms

Grouping data based on similarity.

Week 11

Cluster Analysis: Additional Issues and Algorithms

Advanced clustering techniques and challenges.

Week 12

Anomaly Detection

Methods for detecting unusual patterns or outliers in data, important for applications like fraud detection and security.

Week 13

Avoiding False Discoveries

Techniques for ensuring statistical validity and avoiding misleading results in ML research.

Week 14

Large Language Model

Large language model for Data Science, such as labeling,

data generation.

Week 15

Final Exam