All explanations and communication about the course will be available on the piazza page (BILL 5210). Registration is mandatory for students taking the course. Access code will be given to the students via e-mail.

Research Topic
Customer Relationship Prediction (KDD Cup 2009)
Breast Cancer (KDD Cup 2008)
Consumer Recommendations (KDD Cup 2007)
Crime/fraud detection using data mining
Comparing the Performance of Association Rule Mining Algorithms on Public Databases
Text Mining on Twitter Streams
Identify Which Authors Correspond to the Same Person (KDD Cup 2013 - Track 2)
BioMed document; plus gene role classification (KDD Cup 2002)
Implementation of GSpan Method on Network Data
Recommendation System for film data sets
Data Classification with Multi Kernel SVM
Traffic Sign Classification with SVM, kNN, and CNN
Missing Value Imputation on Fırat Basin with Sparse Learning
Application of Deep Learning on Speech Signals
Student Performance Evaluation (KDD Cup 2010)
Predict the Click-Through Tate of Ads Given the Query and User Information (KDD Cup'12)
Handwriting Recognition with LPP Method
Heuristic Algorithm Based Dynamic Image Clustering
Growing Hierarchical Self-Organizing Maps
Clustering Adaptive Genetic Algorithm
Graph-based Data Mining and gSpan
Deep Belief Networks
Neighbourhood-Based Collaborative Filtering Approach Using K-means Clustering
Transductive Learning - Transductive SVM
Conditional Inference forest
Multiclass Multi-kernel Relevance Vector Machines
Adaptive Semi-supervised Learning 
Multiple-Instance Learning
Multi-task Learning
Reinforcement Learning - AWESOME

Course Information Package:

BILL 5210 Knowledge Discovery in Large Data Sets 3+0+0 4
Year / Semester Fall semester
Level of Course Post graduate
Status Elective
Department Computer Engineering
Prerequisites and co-requisites  N/A
Mode of Delivery Face to face
Contact hours 14 weeks
Lecturer Asst. Prof. Dr. Murat Aykut
Co-Lecturer  
Language of instruction Turkish
Internship N/A
 
Course Syllabus
Week Subject Related Notes / Files
Week 1 Basic concepts – Knowledge discovery, data mining, large data sets, data warehouse  
Week 2 Preprocessing Methods: Data cleansing, handling missing attribute values, dimension reduction, discretization methods, feature extraction  
Week 3 Outlier analysis: Extreme value analysis, probabilistic models, clustering for outlier detection, distance-based outlier detection, information-theoretic models, outlier validity  
Week 4 Supervised learning  – Statistical learning theory, statistical inference, regression estimation, model estimation  
Week 5 Bayesian inference, variance analysis, linear discriminant analysis, Support Vector Machines  
Week 6 Instance-based learning – Reducing the number of exemplars, pruning noisy exemplars, weighting attributes, instance-based learning methods  
Week 7 Decision trees – C4.5 algorithm, unknown attribute values, limitations of decision trees, associative classification method  
Week 8 Clustering analysis – Similarity measures, agglomerative hierarchical clustering, partitional clustering, incremental clustering, graph based and density based clustering  
Week 9 Midterm exam  
Week 10 Association rules – algorithm apriori, multidimensional association rules, mining path, web mining, text mining  
Week 11 Advances in Data Mining: Graph mining, temporal data mining, spatial data mining, distributed data mining  
Week 12 Advanced Methods: mining multi-label data, meta-learning, data mining for imbalanced datasets, ensemble methods  
Week 13 Scalable classification, rare class learning, regression modeling with numeric classes, semisupervised learning, active learning  
Week 14 Visualization methods: Perception and visualization, scientific visualization, radial visualization, visualization using self-organizing-maps  
Week 15 Term project  
Week 16 Final exam  
Textbook / Material
1 O. ve Rokach, L., Data Mining and Knowledge Discovery Handbook, Maimon, Springer, 2010,  1285 pages.  
Recommended Reading
2 Kantardzic, M., Data Mining: Concepts, Models, Methods, and Algorithms, John Wiley & Sons, 2003, 343 pages.
3 Witten, I. H. ve Frank, E., Data Mining - Practical Machine Learning Tools and Techniques, Morgan Kaufmann Publishers, 2005, 525 pages.
4 Aggarwal, C. C., Data Mining – The TextBook, Springer, 2015, 734 pages.