All explanations and communication about the course will be available on the piazza page (BILL 5210). Registration is mandatory for students taking the course. Access code will be given to the students via e-mail.
Research Topic |
Customer Relationship Prediction (KDD Cup 2009) |
Breast Cancer (KDD Cup 2008) |
Consumer Recommendations (KDD Cup 2007) |
Crime/fraud detection using data mining |
Comparing the Performance of Association Rule Mining Algorithms on Public Databases |
Text Mining on Twitter Streams |
Identify Which Authors Correspond to the Same Person (KDD Cup 2013 - Track 2) |
BioMed document; plus gene role classification (KDD Cup 2002) |
Implementation of GSpan Method on Network Data |
Recommendation System for film data sets |
Data Classification with Multi Kernel SVM |
Traffic Sign Classification with SVM, kNN, and CNN |
Missing Value Imputation on Fırat Basin with Sparse Learning |
Application of Deep Learning on Speech Signals |
Student Performance Evaluation (KDD Cup 2010) |
Predict the Click-Through Tate of Ads Given the Query and User Information (KDD Cup'12) |
Handwriting Recognition with LPP Method |
Heuristic Algorithm Based Dynamic Image Clustering |
Growing Hierarchical Self-Organizing Maps |
Clustering Adaptive Genetic Algorithm |
Graph-based Data Mining and gSpan |
Deep Belief Networks |
Neighbourhood-Based Collaborative Filtering Approach Using K-means Clustering |
Transductive Learning - Transductive SVM |
Conditional Inference forest |
Multiclass Multi-kernel Relevance Vector Machines |
Adaptive Semi-supervised Learning |
Multiple-Instance Learning |
Multi-task Learning |
Reinforcement Learning - AWESOME |
Course Information Package:
BILL 5210 | Knowledge Discovery in Large Data Sets | 3+0+0 | 4 | ||||
Year / Semester | Fall semester | ||||||
Level of Course | Post graduate | ||||||
Status | Elective | ||||||
Department | Computer Engineering | ||||||
Prerequisites and co-requisites | N/A | ||||||
Mode of Delivery | Face to face | ||||||
Contact hours | 14 weeks | ||||||
Lecturer | Asst. Prof. Dr. Murat Aykut | ||||||
Co-Lecturer | |||||||
Language of instruction | Turkish | ||||||
Internship | N/A | ||||||
Course Syllabus | |||||||
Week | Subject | Related Notes / Files | |||||
Week 1 | Basic concepts – Knowledge discovery, data mining, large data sets, data warehouse | ||||||
Week 2 | Preprocessing Methods: Data cleansing, handling missing attribute values, dimension reduction, discretization methods, feature extraction | ||||||
Week 3 | Outlier analysis: Extreme value analysis, probabilistic models, clustering for outlier detection, distance-based outlier detection, information-theoretic models, outlier validity | ||||||
Week 4 | Supervised learning – Statistical learning theory, statistical inference, regression estimation, model estimation | ||||||
Week 5 | Bayesian inference, variance analysis, linear discriminant analysis, Support Vector Machines | ||||||
Week 6 | Instance-based learning – Reducing the number of exemplars, pruning noisy exemplars, weighting attributes, instance-based learning methods | ||||||
Week 7 | Decision trees – C4.5 algorithm, unknown attribute values, limitations of decision trees, associative classification method | ||||||
Week 8 | Clustering analysis – Similarity measures, agglomerative hierarchical clustering, partitional clustering, incremental clustering, graph based and density based clustering | ||||||
Week 9 | Midterm exam | ||||||
Week 10 | Association rules – algorithm apriori, multidimensional association rules, mining path, web mining, text mining | ||||||
Week 11 | Advances in Data Mining: Graph mining, temporal data mining, spatial data mining, distributed data mining | ||||||
Week 12 | Advanced Methods: mining multi-label data, meta-learning, data mining for imbalanced datasets, ensemble methods | ||||||
Week 13 | Scalable classification, rare class learning, regression modeling with numeric classes, semisupervised learning, active learning | ||||||
Week 14 | Visualization methods: Perception and visualization, scientific visualization, radial visualization, visualization using self-organizing-maps | ||||||
Week 15 | Term project | ||||||
Week 16 | Final exam | ||||||
Textbook / Material | |||||||
1 | O. ve Rokach, L., Data Mining and Knowledge Discovery Handbook, Maimon, Springer, 2010, 1285 pages. | ||||||
Recommended Reading | |||||||
2 | Kantardzic, M., Data Mining: Concepts, Models, Methods, and Algorithms, John Wiley & Sons, 2003, 343 pages. | ||||||
3 | Witten, I. H. ve Frank, E., Data Mining - Practical Machine Learning Tools and Techniques, Morgan Kaufmann Publishers, 2005, 525 pages. | ||||||
4 | Aggarwal, C. C., Data Mining – The TextBook, Springer, 2015, 734 pages. |