Kaggle Best Notebooks — Topic wise ( Data Science and Machine Learning)
Welcome back peeps! Today with this I’m gonna open Kaggles’ pandora’s box — MY list of Kaggle Best Notebooks — each topic wise for Data Science and Machine Leaning .
I have been participating in the Kaggle competitions for past 4.5 years during my free time and it’s been an incredible learning curve. As much as I loved writing my own solution to the problems on the platform, I thoroughly went through some of the top notebooks only to find the gems hidden beneath. Thanks to the amazing community of Kaggle ( especially the star notebooks) — I have learned so much and implemented those learnings at my job.
Disclaimer : This is my list that I’m just sharing so that people who are getting started in the field of Data Science and ML don’t fall in the rabbit hole with overwhelming information out there. Remember learning is a three step process — one what do you want to learn, second from where you want to learn and third implement what you learned.
Lets’s dive in!
Web Scraping
- https://www.kaggle.com/code/daniboy370/tutorial-web-scraping
- https://www.kaggle.com/code/dierickx3/kaggle-web-scraping-via-headless-firefox-selenium
- https://www.kaggle.com/code/digvijaysinhgohil/web-scraping-using-python
Python
Ensembling in Python
Pandas
- Star Notebook : https://www.kaggle.com/code/kashnitsky/topic-1-exploratory-data-analysis-with-pandas
- Star Notebook : https://www.kaggle.com/code/prashant111/comprehensive-data-analysis-with-pandas
- https://www.kaggle.com/code/sohier/tutorial-accessing-data-with-pandas
- https://www.kaggle.com/code/kashnitsky/a1-demo-pandas-and-uci-adult-dataset
- https://www.kaggle.com/code/ash316/learn-pandas-with-pokemons
- https://www.kaggle.com/code/frtgnn/simple-profiling-eda-using-pandas-profiling
- https://www.kaggle.com/code/corazzon/how-to-use-pandas-filter-in-survey-eda
- https://www.kaggle.com/code/shivan118/pandas-100-tricks
Data Exploration
- Star Notebook : https://www.kaggle.com/code/sudalairajkumar/simple-exploration-notebook-zillow-prize
- https://www.kaggle.com/code/pmarcelino/comprehensive-data-exploration-with-python/notebook
Data pre-processing
- Star notebook : https://www.kaggle.com/code/ldfreeman3/a-data-science-framework-to-achieve-99-accuracy/notebook
- Star Notebook : https://www.kaggle.com/code/nkitgupta/advance-data-preprocessing
- Star Notebook : https://www.kaggle.com/code/agrawaladitya/step-by-step-data-preprocessing-eda
- https://www.kaggle.com/code/gzuidhof/full-preprocessing-tutorial
- https://www.kaggle.com/code/sudalairajkumar/getting-started-with-text-preprocessing
- https://www.kaggle.com/code/nz0722/simple-eda-text-preprocessing-jigsaw
- https://www.kaggle.com/code/smasar/tutorial-preprocessing-processing-evaluation
- https://www.kaggle.com/code/vikassingh1996/extensive-data-preprocessing-and-modeling
Text Preprocessing
- Star Notebook : https://www.kaggle.com/code/sudalairajkumar/getting-started-with-text-preprocessing
- https://www.kaggle.com/code/shashanksai/text-preprocessing-using-python
- https://www.kaggle.com/code/theoviel/improve-your-score-with-some-text-preprocessing
- https://www.kaggle.com/code/l3nnys/useful-text-preprocessing-on-the-datasets
- https://www.kaggle.com/code/balatmak/text-preprocessing-steps-and-universal-pipeline
- https://www.kaggle.com/code/srinivasav22/text-preprocessing-and-advanced-functions
- https://www.kaggle.com/code/awadhi123/text-preprocessing-using-nltk
Data Visualizations
- https://www.kaggle.com/code/andresionek/how-to-create-award-winning-data-visualizations/notebook
- https://www.kaggle.com/code/willcanniford/chocolate-bar-ratings-extensive-eda/report
- https://www.kaggle.com/code/ash316/eda-to-prediction-dietanic
- https://www.kaggle.com/code/deffro/eda-is-fun
- https://www.kaggle.com/code/gpreda/santander-eda-and-prediction
Interactive Visualizations
- Star Notebook : https://www.kaggle.com/code/tavoosi/tutorial-interactive-data-visualizations
- Star Notebook : https://www.kaggle.com/code/maheshdadhich/strength-of-visualization-python-visuals-tutorial
- https://www.kaggle.com/code/erikbruin/airbnb-the-amsterdam-story-with-interactive-maps
- https://www.kaggle.com/code/subinium/kaggle-2020-visualization-analysis
- https://www.kaggle.com/code/pranav84/kiva-loans-eda-part-1-interactive-visualizations/report
How to deal with Imbalanced Datasets
- https://www.kaggle.com/code/janiobachmann/credit-fraud-dealing-with-imbalanced-datasets/notebook
- https://www.kaggle.com/code/rafjaa/resampling-strategies-for-imbalanced-datasets
- https://www.kaggle.com/code/souravsaha1605/comprehensive-guide-on-imbalanced-data-handling
- https://www.kaggle.com/code/shahules/tackling-class-imbalance
- https://www.kaggle.com/code/suyashlakhani/credit-card-fraud-handling-imbalanced-dataset-98
Tabular Data
- Star Notebook : https://www.kaggle.com/code/vbmokin/data-science-for-tabular-data-advanced-techniques
- https://www.kaggle.com/code/vbmokin/data-science-for-tabular-data-advanced-techniques
- https://www.kaggle.com/code/manabendrarout/tabular-data-preparation-basic-eda-and-baseline
- https://www.kaggle.com/code/vbmokin/50-tips-data-science-tabular-data-for-beginner
- https://www.kaggle.com/code/vbmokin/50-advanced-tips-data-science-for-tabular-data
- https://www.kaggle.com/code/parulpandey/explainable-boosting-machines-for-tabular-data
Mathematical & Statistical Skills
- Star Notebook : https://www.kaggle.com/code/carlolepelaars/statistics-tutorial
- Star Notebooks : https://www.kaggle.com/code/kanncaa1/statistical-learning-tutorial-for-beginners
- https://www.kaggle.com/code/upadorprofzs/statistical-analysis-descriptive-statistics-br
- https://www.kaggle.com/code/yashvi/practical-statistics-1-descriptive-statistics
Feature Engineering
- Star Notebook : https://www.kaggle.com/code/codename007/home-credit-complete-eda-feature-importance
- Star Notebook : https://www.kaggle.com/code/artgor/eda-feature-engineering-and-everything
- Star Notebook : https://www.kaggle.com/code/kashnitsky/topic-6-feature-engineering-and-feature-selection/notebook
- https://www.kaggle.com/code/dlarionov/feature-engineering-xgboost
- https://www.kaggle.com/code/eikedehling/feature-engineering/notebook
- https://www.kaggle.com/code/gunesevitan/titanic-advanced-feature-engineering-tutorial
- https://www.kaggle.com/code/willkoehrsen/introduction-to-manual-feature-engineering
- https://www.kaggle.com/code/willkoehrsen/automated-feature-engineering-basics
- https://www.kaggle.com/code/rejasupotaro/effective-feature-engineering
Modelling
- Start notebook :https://www.kaggle.com/code/odins0n/spaceship-titanic-eda-27-different-models
- Star Notebook : https://www.kaggle.com/code/dansbecker/how-models-work
- https://www.kaggle.com/code/kanncaa1/feature-selection-and-data-visualization
- https://www.kaggle.com/code/artgor/eda-and-models
Model Performance
Hyper Parameter Tuning
- Star Notebook : https://www.kaggle.com/code/willkoehrsen/intro-to-model-tuning-grid-and-random-search
- https://www.kaggle.com/code/ldfreeman3/a-data-science-framework-to-achieve-99-accuracy
- https://www.kaggle.com/code/prashant111/a-guide-on-xgboost-hyperparameters-tuning
XGBoost & LightGBM & Catboost
- https://www.kaggle.com/code/kaanboke/xgboost-lightgbm-catboost-imbalanced-data
- https://www.kaggle.com/code/dansbecker/xgboost
- https://www.kaggle.com/code/eliotbarr/stacking-test-sklearn-xgboost-catboost-lightgbm
Sklearn and ML Pipeline
- Star Notebook : https://www.kaggle.com/code/kanncaa1/machine-learning-tutorial-for-beginners
- https://www.kaggle.com/code/armandsauzay/sklearn-pipelines-made-easy
- https://www.kaggle.com/code/ialimustufa/titanic-beginner-s-guide-with-sklearn
- https://www.kaggle.com/code/neviadomski/how-to-get-to-top-25-with-simple-model-sklearn
- https://www.kaggle.com/code/baghern/a-deep-dive-into-sklearn-pipelines
- https://www.kaggle.com/code/sermakarevich/sklearn-pipelines-tutorial
- https://www.kaggle.com/code/residentmario/automated-feature-selection-with-sklearn
- https://www.kaggle.com/code/qitvision/a-complete-ml-pipeline-fast-ai
- https://www.kaggle.com/code/poonaml/titanic-survival-prediction-end-to-end-ml-pipeline
- https://www.kaggle.com/code/huanvo/lyft-complete-train-and-prediction-pipeline
- https://www.kaggle.com/code/pouryaayria/a-complete-ml-pipeline-tutorial-acu-86
- Star Notebook : https://www.kaggle.com/code/dansbecker/pipelines
Naive Bayes
- Star Notebook : https://www.kaggle.com/code/prashant111/naive-bayes-classifier-in-python
- https://www.kaggle.com/code/blackblitz/gaussian-naive-bayes
- https://www.kaggle.com/code/julian3833/jigsaw-incredibly-simple-naive-bayes-0-768
- https://www.kaggle.com/code/startupsci/titanic-data-science-solutions
- https://www.kaggle.com/code/akshaysharma001/naive-bayes-with-hyperpameter-tuning
Binary Classification
- Star Notebook : https://www.kaggle.com/code/rnmehta5/pima-indian-diabetes-binary-classification
- https://www.kaggle.com/code/tanetboss/beginner-binary-classification-for-nice-movie
- https://www.kaggle.com/code/jashsheth5/binary-classification-with-sklearn-and-keras-95
Linear Regression
Logistic Regression
- https://www.kaggle.com/code/kanncaa1/logistic-regression-implementation
- https://www.kaggle.com/code/faressayah/logistic-regression-data-preprocessing
Decision Trees
- https://www.kaggle.com/code/kashnitsky/topic-3-decision-trees-and-knn
- https://www.kaggle.com/code/kashnitsky/a3-demo-decision-trees-solution
- https://www.kaggle.com/code/faressayah/decision-trees-random-forest-for-beginners
- https://www.kaggle.com/code/gauravduttakiit/hyperparameter-tuning-in-decision-trees
- https://www.kaggle.com/code/prashant111/decision-tree-classifier-tutorial
Clustering
- Star Notebook : https://www.kaggle.com/code/kashnitsky/topic-7-unsupervised-learning-pca-and-clustering/notebook
- Star Notebook : https://www.kaggle.com/code/fazilbtopal/popular-unsupervised-clustering-algorithms
- Star Notebook : https://www.kaggle.com/code/maksimeren/covid-19-literature-clustering
- https://www.kaggle.com/code/kushal1996/customer-segmentation-k-means-analysis
- https://www.kaggle.com/code/karnikakapoor/customer-segmentation-clustering
- https://www.kaggle.com/code/hellbuoy/online-retail-k-means-hierarchical-clustering
- https://www.kaggle.com/code/prashant111/k-means-clustering-with-python
- https://www.kaggle.com/code/sabanasimbutt/clustering-visualization-of-clusters-using-pca
Gradient Boosting
- https://www.kaggle.com/code/kashnitsky/topic-10-gradient-boosting/notebook
- https://www.kaggle.com/code/ambrosm/tpsmay22-gradient-boosting-quickstart
- https://www.kaggle.com/code/grroverpr/gradient-boosting-simplified
K-Nearest Neighbors
- Star Notebook : https://www.kaggle.com/code/kashnitsky/topic-3-decision-trees-and-knn/notebook
- https://www.kaggle.com/code/shrutimechlearn/step-by-step-diabetes-classification-knn-detailed
- https://www.kaggle.com/code/prashant111/knn-classifier-tutorial
- https://www.kaggle.com/code/cdeotte/mnist-perfect-100-using-knn
- https://www.kaggle.com/code/mgabrielkerr/visualizing-knn-svm-and-xgboost-on-iris-dataset
- Star Notebook : https://www.kaggle.com/code/shrutimechlearn/step-by-step-diabetes-classification-knn-detailed
Support Vector Machines
- Star Notebook : https://www.kaggle.com/code/nirajvermafcb/support-vector-machine-detail-analysis
- https://www.kaggle.com/code/faressayah/support-vector-machine-pca-tutorial-for-beginner
- https://www.kaggle.com/code/arshid/support-vector-machine-on-iris-flower-dataset
- https://www.kaggle.com/code/codeblogger/step-by-step-support-vector-machine-svm
Happy learning and Kaggling :)
Follow for more updates, stay tuned and of-course let me end this post with a quote by Steve Jobs ;)
“Your work is going to fill a large part of your life, and the only way to be truly satisfied is to do what you believe is great work. And the only way to do great work is to love what you do. If you haven’t found it yet, keep looking. Don’t settle. As with all matters of the heart, you’ll know when you find it.”