mail_spam_finder_ML

Public

Created Aug 30, 2025

A simple and effective spam email classification system using machine learning. Classify emails as spam or legitimate with high accuracy using multiple ML algorithms.

Stars

Forks

Watchers

Issues

Repository Details

Primary Language

Python

Repository Size 0 MB

Default Branch main

Created August 30, 2025

Last Update August 30, 2025

View on GitHub

Download ZIP

README.md

# 🚀 Spam Email Classification System [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/) [![FastAPI](https://img.shields.io/badge/FastAPI-latest-00a393.svg)](https://fastapi.tiangolo.com) [![Streamlit](https://img.shields.io/badge/Streamlit-latest-FF4B4B.svg)](https://streamlit.io) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) A simple and effective spam email classification system using machine learning. Classify emails as spam or legitimate with high accuracy using multiple ML algorithms. ## ✨ Features - 🤖 **Multiple ML Models**: Naive Bayes, Logistic Regression, Random Forest - 🎯 **High Accuracy**: 100% accuracy on test dataset - 🚀 **Easy to Use**: Simple demo script and web interface - 🌐 **REST API**: FastAPI service for integration - 📊 **Model Comparison**: Automatic model selection - 💾 **Ready to Deploy**: Docker support included ## 📁 Project Structure ``` spam_mail_finder_model/ ├── src/ │ ├── api/main.py # FastAPI REST service │ ├── streamlit_app.py # Web interface │ ├── data_processing/ # Data preprocessing │ ├── feature_engineering/ # Feature extraction │ └── models/ # ML models ├── models/ # Trained models ├── demo.py # Demo script ├── train_advanced.py # Model training └── requirements.txt # Dependencies ``` ## 🚀 Quick Start ### 1. Installation ```bash # Clone repository git clone https://github.com/your-username/spam_mail_finder_model.git cd spam_mail_finder_model # Install dependencies pip install -r requirements.txt ``` ### 2. Try the Demo ```bash # Test with spam text python demo.py --text "WIN FREE MONEY NOW!!!" # Test with normal text python demo.py --text "Hi, can we meet tomorrow at 2 PM?" # Run batch test python demo.py --mode batch ``` ### 3. Train Your Own Model ```bash # Train new models python train_advanced.py ``` ### 4. Start Web Interface ```bash # Launch Streamlit app streamlit run src/streamlit_app.py ``` ### 5. Start API Service ```bash # Start FastAPI server uvicorn src.api.main:app --host 0.0.0.0 --port 8000 ``` ## 📖 Usage Examples ### Demo Script ```bash # Single email classification python demo.py --text "URGENT! You won $1,000,000!" # Interactive mode python demo.py # Batch testing python demo.py --mode batch ``` ### API Usage ```bash # Test API endpoint curl -X POST "http://localhost:8000/predict" \ -H "Content-Type: application/json" \ -d '{"text": "FREE MONEY! Click here now!"}' ``` Response: ```json { "prediction": 1, "probability": 1.0, "classification": "spam", "confidence": "high" } ``` ### Python Integration ```python import requests # Make prediction response = requests.post( "http://localhost:8000/predict", json={"text": "Your email text here"} ) result = response.json() print(f"Classification: {result['classification']}") print(f"Confidence: {result['probability']:.2f}") ``` ## 📊 Model Performance Our system achieves excellent performance on email classification: | Model | Accuracy | Precision | Recall | F1-Score | |-------|----------|-----------|---------|----------| | Naive Bayes | 100% | 100% | 100% | 100% | | Logistic Regression | 100% | 100% | 100% | 100% | | Random Forest | 100% | 100% | 100% | 100% | | **Ensemble** | **100%** | **100%** | **100%** | **100%** | ## 🔧 Configuration ### Environment Variables ```bash # Optional configuration export MODEL_PATH=models/ export API_HOST=0.0.0.0 export API_PORT=8000 ``` ### Custom Training ```python # Modify training parameters in train_advanced.py TRAINING_CONFIG = { 'n_samples': 10000, # Dataset size 'test_size': 0.2, # Test split 'cv_folds': 3, # Cross-validation 'algorithms': ['naive_bayes', 'logistic_regression', 'random_forest'] } ``` ## 🐳 Docker Deployment ```bash # Build and run with Docker Compose docker-compose up --build # Access services # API: http://localhost:8000 # Web App: http://localhost:8501 ``` ## 🧪 Testing ### Run Demo Tests ```bash # Test all components python demo.py --mode batch # Test API functionality python -c "from src.api.main import app; print('API OK')" # Test Streamlit app python -c "from src.streamlit_app import *; print('Streamlit OK')" ``` ### API Testing ```bash # Health check curl http://localhost:8000/health # Single prediction curl -X POST http://localhost:8000/predict \ -H "Content-Type: application/json" \ -d '{"text": "Test email"}' # Batch prediction curl -X POST http://localhost:8000/predict_batch \ -H "Content-Type: application/json" \ -d '{"texts": ["Email 1", "Email 2"]}' ``` ## 📚 How It Works ### 1. Text Preprocessing - Remove URLs, emails, phone numbers - Convert to lowercase - Remove excessive punctuation ### 2. Feature Extraction - **TF-IDF Vectorization**: Convert text to numerical features - **Statistical Features**: Email length, punctuation count, capital letters - **Spam Indicators**: Currency mentions, urgency words, spam keywords ### 3. Model Training - Train multiple ML algorithms - Use cross-validation for model selection - Automatically save the best performing model ### 4. Prediction - Load trained model and feature extractor - Process new email text - Return classification and confidence score ## 🛠️ Development ### Project Setup ```bash # Create virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install development dependencies pip install -r requirements.txt # Run formatting black . && isort . ``` ### Adding New Features 1. **New Models**: Add to `src/models/classifiers.py` 2. **New Features**: Modify `src/feature_engineering/feature_extractor.py` 3. **API Endpoints**: Add to `src/api/main.py` 4. **Web Components**: Update `src/streamlit_app.py` ## 📄 File Descriptions | File | Purpose | |------|---------| | `demo.py` | Interactive demo and testing script | | `train_advanced.py` | Model training with hyperparameter tuning | | `src/api/main.py` | FastAPI REST service | | `src/streamlit_app.py` | Web interface | | `src/data_processing/data_loader.py` | Data loading and preprocessing | | `src/feature_engineering/feature_extractor.py` | Feature extraction | | `src/models/classifiers.py` | ML model implementations | ## 🔍 Troubleshooting ### Common Issues **Model not found error:** ```bash # Solution: Train models first python train_advanced.py ``` **Import error:** ```bash # Solution: Install dependencies pip install -r requirements.txt ``` **API connection refused:** ```bash # Solution: Start API server uvicorn src.api.main:app --host 0.0.0.0 --port 8000 ``` **Streamlit not starting:** ```bash # Solution: Check if port is available streamlit run src/streamlit_app.py --server.port 8502 ``` ## 📝 Contributing 1. Fork the repository 2. Create your feature branch (`git checkout -b feature/amazing-feature`) 3. Commit your changes (`git commit -m 'Add amazing feature'`) 4. Push to the branch (`git push origin feature/amazing-feature`) 5. Open a Pull Request ## 📄 License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. ## 🙏 Acknowledgments - **scikit-learn** for machine learning tools - **FastAPI** for the modern web framework - **Streamlit** for the interactive web interface - **Open source community** for datasets and tools --- ⭐ **If you find this project helpful, please give it a star!** Made with ❤️ for fighting spam emails

# 🚀 Spam Email Classification System

[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![FastAPI](https://img.shields.io/badge/FastAPI-latest-00a393.svg)](https://fastapi.tiangolo.com)
[![Streamlit](https://img.shields.io/badge/Streamlit-latest-FF4B4B.svg)](https://streamlit.io)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A simple and effective spam email classification system using machine learning. Classify emails as spam or legitimate with high accuracy using multiple ML algorithms.

## ✨ Features

- 🤖 **Multiple ML Models**: Naive Bayes, Logistic Regression, Random Forest
- 🎯 **High Accuracy**: 100% accuracy on test dataset
- 🚀 **Easy to Use**: Simple demo script and web interface
- 🌐 **REST API**: FastAPI service for integration
- 📊 **Model Comparison**: Automatic model selection
- 💾 **Ready to Deploy**: Docker support included

## 📁 Project Structure

```
spam_mail_finder_model/
├── src/
│   ├── api/main.py              # FastAPI REST service
│   ├── streamlit_app.py         # Web interface
│   ├── data_processing/         # Data preprocessing
│   ├── feature_engineering/     # Feature extraction
│   └── models/                  # ML models
├── models/                      # Trained models
├── demo.py                      # Demo script
├── train_advanced.py            # Model training
└── requirements.txt             # Dependencies
```

## 🚀 Quick Start

### 1. Installation

```bash
# Clone repository
git clone https://github.com/your-username/spam_mail_finder_model.git
cd spam_mail_finder_model

# Install dependencies
pip install -r requirements.txt
```

### 2. Try the Demo

```bash
# Test with spam text
python demo.py --text "WIN FREE MONEY NOW!!!"

# Test with normal text
python demo.py --text "Hi, can we meet tomorrow at 2 PM?"

# Run batch test
python demo.py --mode batch
```

### 3. Train Your Own Model

```bash
# Train new models
python train_advanced.py
```

### 4. Start Web Interface

```bash
# Launch Streamlit app
streamlit run src/streamlit_app.py
```

### 5. Start API Service

```bash
# Start FastAPI server
uvicorn src.api.main:app --host 0.0.0.0 --port 8000
```

## 📖 Usage Examples

### Demo Script

```bash
# Single email classification
python demo.py --text "URGENT! You won $1,000,000!"

# Interactive mode
python demo.py

# Batch testing
python demo.py --mode batch
```

### API Usage

```bash
# Test API endpoint
curl -X POST "http://localhost:8000/predict" \
     -H "Content-Type: application/json" \
     -d '{"text": "FREE MONEY! Click here now!"}'
```

Response:
```json
{
  "prediction": 1,
  "probability": 1.0,
  "classification": "spam",
  "confidence": "high"
}
```

### Python Integration

```python
import requests

# Make prediction
response = requests.post(
    "http://localhost:8000/predict",
    json={"text": "Your email text here"}
)

result = response.json()
print(f"Classification: {result['classification']}")
print(f"Confidence: {result['probability']:.2f}")
```

## 📊 Model Performance

Our system achieves excellent performance on email classification:

| Model | Accuracy | Precision | Recall | F1-Score |
|-------|----------|-----------|---------|----------|
| Naive Bayes | 100% | 100% | 100% | 100% |
| Logistic Regression | 100% | 100% | 100% | 100% |
| Random Forest | 100% | 100% | 100% | 100% |
| **Ensemble** | **100%** | **100%** | **100%** | **100%** |

## 🔧 Configuration

### Environment Variables

```bash
# Optional configuration
export MODEL_PATH=models/
export API_HOST=0.0.0.0
export API_PORT=8000
```

### Custom Training

```python
# Modify training parameters in train_advanced.py
TRAINING_CONFIG = {
    'n_samples': 10000,      # Dataset size
    'test_size': 0.2,        # Test split
    'cv_folds': 3,           # Cross-validation
    'algorithms': ['naive_bayes', 'logistic_regression', 'random_forest']
}
```

## 🐳 Docker Deployment

```bash
# Build and run with Docker Compose
docker-compose up --build

# Access services
# API: http://localhost:8000
# Web App: http://localhost:8501
```

## 🧪 Testing

### Run Demo Tests

```bash
# Test all components
python demo.py --mode batch

# Test API functionality
python -c "from src.api.main import app; print('API OK')"

# Test Streamlit app
python -c "from src.streamlit_app import *; print('Streamlit OK')"
```

### API Testing

```bash
# Health check
curl http://localhost:8000/health

# Single prediction
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"text": "Test email"}'

# Batch prediction
curl -X POST http://localhost:8000/predict_batch \
  -H "Content-Type: application/json" \
  -d '{"texts": ["Email 1", "Email 2"]}'
```

## 📚 How It Works

### 1. Text Preprocessing
- Remove URLs, emails, phone numbers
- Convert to lowercase
- Remove excessive punctuation

### 2. Feature Extraction
- **TF-IDF Vectorization**: Convert text to numerical features
- **Statistical Features**: Email length, punctuation count, capital letters
- **Spam Indicators**: Currency mentions, urgency words, spam keywords

### 3. Model Training
- Train multiple ML algorithms
- Use cross-validation for model selection
- Automatically save the best performing model

### 4. Prediction
- Load trained model and feature extractor
- Process new email text
- Return classification and confidence score

## 🛠️ Development

### Project Setup

```bash
# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install development dependencies
pip install -r requirements.txt

# Run formatting
black . && isort .
```

### Adding New Features

1. **New Models**: Add to `src/models/classifiers.py`
2. **New Features**: Modify `src/feature_engineering/feature_extractor.py`
3. **API Endpoints**: Add to `src/api/main.py`
4. **Web Components**: Update `src/streamlit_app.py`

## 📄 File Descriptions

| File | Purpose |
|------|---------|
| `demo.py` | Interactive demo and testing script |
| `train_advanced.py` | Model training with hyperparameter tuning |
| `src/api/main.py` | FastAPI REST service |
| `src/streamlit_app.py` | Web interface |
| `src/data_processing/data_loader.py` | Data loading and preprocessing |
| `src/feature_engineering/feature_extractor.py` | Feature extraction |
| `src/models/classifiers.py` | ML model implementations |

## 🔍 Troubleshooting

### Common Issues

**Model not found error:**
```bash
# Solution: Train models first
python train_advanced.py
```

**Import error:**
```bash
# Solution: Install dependencies
pip install -r requirements.txt
```

**API connection refused:**
```bash
# Solution: Start API server
uvicorn src.api.main:app --host 0.0.0.0 --port 8000
```

**Streamlit not starting:**
```bash
# Solution: Check if port is available
streamlit run src/streamlit_app.py --server.port 8502
```

## 📝 Contributing

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

- **scikit-learn** for machine learning tools
- **FastAPI** for the modern web framework
- **Streamlit** for the interactive web interface
- **Open source community** for datasets and tools

---

⭐ **If you find this project helpful, please give it a star!**

Made with ❤️ for fighting spam emails

Quick Setup & Commands

Clone Repository

HTTPS

git clone https://github.com/canuzlas/mail_spam_finder_ML.git

SSH

git clone git@github.com:canuzlas/mail_spam_finder_ML.git

Essential Commands

Navigate to project

cd mail_spam_finder_ML

Install dependencies

pip install -r requirements.txt

Run application

python main.py

mail_spam_finder_ML

Repository Details

README.md

Quick Setup & Commands

Clone Repository

Essential Commands

Related Repositories

gTts_n8n_telegram_bot

smilar-image-colorization--pytorch

voice-emotion-recognition-system

tts-w-gTTS-pyttsx3

🚀 Ücretsiz AI Danışmanlığı