mail_spam_finder_ML

Public
Created Aug 30, 2025

A simple and effective spam email classification system using machine learning. Classify emails as spam or legitimate with high accuracy using multiple ML algorithms.

1
Stars
0
Forks
1
Watchers
0
Issues

Repository Details

Primary Language
Python
Repository Size 0 MB
Default Branch main
Created August 30, 2025
Last Update August 30, 2025
View on GitHub
Download ZIP

README.md

# 🚀 Spam Email Classification System [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/) [![FastAPI](https://img.shields.io/badge/FastAPI-latest-00a393.svg)](https://fastapi.tiangolo.com) [![Streamlit](https://img.shields.io/badge/Streamlit-latest-FF4B4B.svg)](https://streamlit.io) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) A simple and effective spam email classification system using machine learning. Classify emails as spam or legitimate with high accuracy using multiple ML algorithms. ## ✨ Features - 🤖 **Multiple ML Models**: Naive Bayes, Logistic Regression, Random Forest - 🎯 **High Accuracy**: 100% accuracy on test dataset - 🚀 **Easy to Use**: Simple demo script and web interface - 🌐 **REST API**: FastAPI service for integration - 📊 **Model Comparison**: Automatic model selection - 💾 **Ready to Deploy**: Docker support included ## 📁 Project Structure ``` spam_mail_finder_model/ ├── src/ │ ├── api/main.py # FastAPI REST service │ ├── streamlit_app.py # Web interface │ ├── data_processing/ # Data preprocessing │ ├── feature_engineering/ # Feature extraction │ └── models/ # ML models ├── models/ # Trained models ├── demo.py # Demo script ├── train_advanced.py # Model training └── requirements.txt # Dependencies ``` ## 🚀 Quick Start ### 1. Installation ```bash # Clone repository git clone https://github.com/your-username/spam_mail_finder_model.git cd spam_mail_finder_model # Install dependencies pip install -r requirements.txt ``` ### 2. Try the Demo ```bash # Test with spam text python demo.py --text "WIN FREE MONEY NOW!!!" # Test with normal text python demo.py --text "Hi, can we meet tomorrow at 2 PM?" # Run batch test python demo.py --mode batch ``` ### 3. Train Your Own Model ```bash # Train new models python train_advanced.py ``` ### 4. Start Web Interface ```bash # Launch Streamlit app streamlit run src/streamlit_app.py ``` ### 5. Start API Service ```bash # Start FastAPI server uvicorn src.api.main:app --host 0.0.0.0 --port 8000 ``` ## 📖 Usage Examples ### Demo Script ```bash # Single email classification python demo.py --text "URGENT! You won $1,000,000!" # Interactive mode python demo.py # Batch testing python demo.py --mode batch ``` ### API Usage ```bash # Test API endpoint curl -X POST "http://localhost:8000/predict" \ -H "Content-Type: application/json" \ -d '{"text": "FREE MONEY! Click here now!"}' ``` Response: ```json { "prediction": 1, "probability": 1.0, "classification": "spam", "confidence": "high" } ``` ### Python Integration ```python import requests # Make prediction response = requests.post( "http://localhost:8000/predict", json={"text": "Your email text here"} ) result = response.json() print(f"Classification: {result['classification']}") print(f"Confidence: {result['probability']:.2f}") ``` ## 📊 Model Performance Our system achieves excellent performance on email classification: | Model | Accuracy | Precision | Recall | F1-Score | |-------|----------|-----------|---------|----------| | Naive Bayes | 100% | 100% | 100% | 100% | | Logistic Regression | 100% | 100% | 100% | 100% | | Random Forest | 100% | 100% | 100% | 100% | | **Ensemble** | **100%** | **100%** | **100%** | **100%** | ## 🔧 Configuration ### Environment Variables ```bash # Optional configuration export MODEL_PATH=models/ export API_HOST=0.0.0.0 export API_PORT=8000 ``` ### Custom Training ```python # Modify training parameters in train_advanced.py TRAINING_CONFIG = { 'n_samples': 10000, # Dataset size 'test_size': 0.2, # Test split 'cv_folds': 3, # Cross-validation 'algorithms': ['naive_bayes', 'logistic_regression', 'random_forest'] } ``` ## 🐳 Docker Deployment ```bash # Build and run with Docker Compose docker-compose up --build # Access services # API: http://localhost:8000 # Web App: http://localhost:8501 ``` ## 🧪 Testing ### Run Demo Tests ```bash # Test all components python demo.py --mode batch # Test API functionality python -c "from src.api.main import app; print('API OK')" # Test Streamlit app python -c "from src.streamlit_app import *; print('Streamlit OK')" ``` ### API Testing ```bash # Health check curl http://localhost:8000/health # Single prediction curl -X POST http://localhost:8000/predict \ -H "Content-Type: application/json" \ -d '{"text": "Test email"}' # Batch prediction curl -X POST http://localhost:8000/predict_batch \ -H "Content-Type: application/json" \ -d '{"texts": ["Email 1", "Email 2"]}' ``` ## 📚 How It Works ### 1. Text Preprocessing - Remove URLs, emails, phone numbers - Convert to lowercase - Remove excessive punctuation ### 2. Feature Extraction - **TF-IDF Vectorization**: Convert text to numerical features - **Statistical Features**: Email length, punctuation count, capital letters - **Spam Indicators**: Currency mentions, urgency words, spam keywords ### 3. Model Training - Train multiple ML algorithms - Use cross-validation for model selection - Automatically save the best performing model ### 4. Prediction - Load trained model and feature extractor - Process new email text - Return classification and confidence score ## 🛠️ Development ### Project Setup ```bash # Create virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install development dependencies pip install -r requirements.txt # Run formatting black . && isort . ``` ### Adding New Features 1. **New Models**: Add to `src/models/classifiers.py` 2. **New Features**: Modify `src/feature_engineering/feature_extractor.py` 3. **API Endpoints**: Add to `src/api/main.py` 4. **Web Components**: Update `src/streamlit_app.py` ## 📄 File Descriptions | File | Purpose | |------|---------| | `demo.py` | Interactive demo and testing script | | `train_advanced.py` | Model training with hyperparameter tuning | | `src/api/main.py` | FastAPI REST service | | `src/streamlit_app.py` | Web interface | | `src/data_processing/data_loader.py` | Data loading and preprocessing | | `src/feature_engineering/feature_extractor.py` | Feature extraction | | `src/models/classifiers.py` | ML model implementations | ## 🔍 Troubleshooting ### Common Issues **Model not found error:** ```bash # Solution: Train models first python train_advanced.py ``` **Import error:** ```bash # Solution: Install dependencies pip install -r requirements.txt ``` **API connection refused:** ```bash # Solution: Start API server uvicorn src.api.main:app --host 0.0.0.0 --port 8000 ``` **Streamlit not starting:** ```bash # Solution: Check if port is available streamlit run src/streamlit_app.py --server.port 8502 ``` ## 📝 Contributing 1. Fork the repository 2. Create your feature branch (`git checkout -b feature/amazing-feature`) 3. Commit your changes (`git commit -m 'Add amazing feature'`) 4. Push to the branch (`git push origin feature/amazing-feature`) 5. Open a Pull Request ## 📄 License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. ## 🙏 Acknowledgments - **scikit-learn** for machine learning tools - **FastAPI** for the modern web framework - **Streamlit** for the interactive web interface - **Open source community** for datasets and tools --- ⭐ **If you find this project helpful, please give it a star!** Made with ❤️ for fighting spam emails

Quick Setup & Commands

Clone Repository

HTTPS
git clone https://github.com/canuzlas/mail_spam_finder_ML.git
SSH
git clone git@github.com:canuzlas/mail_spam_finder_ML.git

Essential Commands

Navigate to project
cd mail_spam_finder_ML
Install dependencies
pip install -r requirements.txt
Run application
python main.py

Related Repositories