tts-w-gTTS-pyttsx3

Public

Created Aug 30, 2025

Bu proje, kullanıcıdan girilen metni doğal bir insan sesine dönüştürmek için geliştirilmiştir.

Stars

Forks

Watchers

Issues

Repository Details

Primary Language

Python

Repository Size 0 MB

Default Branch main

Created August 30, 2025

Last Update August 31, 2025

View on GitHub

Download ZIP

README.md

# 🔊 Text-to-Speech (TTS) Projesi Bu proje, kullanıcıdan girilen metni doğal bir insan sesine dönüştürmek için geliştirilmiştir. ## 🎯 Proje Hedefleri - Türkçe ve İngilizce metin → ses dönüşümü - Baseline (gTTS, pyttsx3) ve custom model yaklaşımları - API ve web demo arayüzü - Deployment-ready yapı ## 📂 Proje Yapısı ``` text-to-speech/ │── data/ # Dataset │── notebooks/ # EDA, model denemeleri │── src/ │ ├── preprocessing/ # Text & audio preprocessing │ ├── models/ # Tacotron2, HiFiGAN vb. │ ├── inference.py # Metin -> Ses pipeline │ └── api.py # FastAPI endpoint │── app.py # Streamlit demo │── requirements.txt │── Dockerfile │── README.md ``` ## 🚀 Hızlı Başlangıç ### 1. Gereksinimler ```bash pip install -r requirements.txt ``` ### 2. Baseline TTS Testi ```bash # Türkçe metin synthesis python src/inference.py --text "Merhaba dünya" --method baseline --lang tr # İngilizce metin synthesis python src/inference.py --text "Hello world" --method baseline --lang en ``` ### 3. Demo Arayüzü ```bash streamlit run app.py ``` Tarayıcıda http://localhost:8501 adresine gidin. ### 4. API Server ```bash uvicorn src.api:app --reload ``` API dokümantasyonu: http://localhost:8000/docs ### 5. API Kullanımı ```bash # Türkçe TTS curl -X POST "http://localhost:8000/tts/synthesize" \ -H "Content-Type: application/json" \ -d '{"text": "Merhaba dünya!", "engine": "gtts", "language": "tr"}' # Çoklu metin synthesis curl -X POST "http://localhost:8000/tts/batch" \ -H "Content-Type: application/json" \ -d '{"texts": ["Merhaba", "Nasılsın", "İyi günler"], "language": "tr"}' ``` ### 6. Docker ile Çalıştırma ```bash # Image build docker build -t tts-app . # Container run docker run -p 8000:8000 tts-app # Streamlit demo için docker run -p 8501:8501 tts-app streamlit run app.py --server.address 0.0.0.0 ``` ## 📊 Dataset - **İngilizce**: LJ Speech Dataset - **Türkçe**: Mozilla Common Voice TR - **Çok dilli**: Mozilla Common Voice ## 🏗️ Modeller ### Baseline - gTTS (Google TTS) - pyttsx3 (Offline TTS) ### Custom Models - Tacotron 2 → Text to Mel-spectrogram - HiFi-GAN → Mel-spectrogram to Audio - FastSpeech 2 → Faster inference ## 📈 Değerlendirme Metrikleri - **Objective**: MCD, SNR - **Subjective**: MOS (Mean Opinion Score) ## 🛠️ Teknoloji Stack - **Modelleme**: PyTorch, Coqui TTS - **Veri İşleme**: librosa, phonemizer - **API**: FastAPI - **Demo**: Streamlit - **Deployment**: Docker, HuggingFace Spaces ## 📋 Roadmap - [x] Proje yapısı oluşturma - [x] Baseline TTS implementasyonu (gTTS + pyttsx3) - [x] API geliştirme (FastAPI) - [x] Demo arayüzü (Streamlit) - [x] Docker konfigürasyonu - [ ] Dataset hazırlığı (Mozilla Common Voice TR) - [ ] Custom model training (Tacotron 2 + HiFi-GAN) - [ ] Evaluation metrikleri (MCD, SNR, MOS) - [ ] Deployment (HuggingFace Spaces / AWS) - [ ] Gelişmiş özellikler (Ses klonlama, çok dilli TTS) ## 📊 Proje İstatistikleri - **Desteklenen Diller**: Türkçe, İngilizce, Fransızca, Almanca, İspanyolca - **TTS Engine'leri**: gTTS (online), pyttsx3 (offline) - **API Endpoints**: 7 endpoint (synthesis, batch, download, vb.) - **Demo Özellikleri**: Web arayüzü, ses oynatıcı, dosya indirme - **Deployment**: Docker ready, FastAPI + Streamlit ## 📈 Performans ### Baseline Performance - **gTTS**: ~2-3 saniye/cümle (internet bağımlı) - **pyttsx3**: ~1-2 saniye/cümle (offline) - **Desteklenen Karakter**: Max 1000 karakter/istek - **Çıktı Format**: WAV (16kHz, mono) ## 🔄 Güncel Durum (v1.0.0) ✅ **Tamamlanan**: - Baseline TTS sistemi - REST API (FastAPI) - Web demo (Streamlit) - Docker konfigürasyonu - Çoklu dil desteği 🚧 **Devam Eden**: - Custom model training pipeline - Dataset preprocessing - Evaluation metrikleri 🎯 **Planlanacak**: - Ses klonlama - Real-time streaming - HuggingFace Spaces deploy ## 🤝 Katkıda Bulunma 1. Fork the repository 2. Create feature branch 3. Commit changes 4. Create Pull Request ## 📄 Lisans MIT License

# 🔊 Text-to-Speech (TTS) Projesi

Bu proje, kullanıcıdan girilen metni doğal bir insan sesine dönüştürmek için geliştirilmiştir.

## 🎯 Proje Hedefleri

- Türkçe ve İngilizce metin → ses dönüşümü
- Baseline (gTTS, pyttsx3) ve custom model yaklaşımları
- API ve web demo arayüzü
- Deployment-ready yapı

## 📂 Proje Yapısı

```
text-to-speech/
│── data/                # Dataset
│── notebooks/           # EDA, model denemeleri
│── src/
│    ├── preprocessing/  # Text & audio preprocessing
│    ├── models/         # Tacotron2, HiFiGAN vb.
│    ├── inference.py    # Metin -> Ses pipeline
│    └── api.py          # FastAPI endpoint
│── app.py               # Streamlit demo
│── requirements.txt
│── Dockerfile
│── README.md
```

## 🚀 Hızlı Başlangıç

### 1. Gereksinimler
```bash
pip install -r requirements.txt
```

### 2. Baseline TTS Testi
```bash
# Türkçe metin synthesis
python src/inference.py --text "Merhaba dünya" --method baseline --lang tr

# İngilizce metin synthesis  
python src/inference.py --text "Hello world" --method baseline --lang en
```

### 3. Demo Arayüzü
```bash
streamlit run app.py
```
Tarayıcıda http://localhost:8501 adresine gidin.

### 4. API Server
```bash
uvicorn src.api:app --reload
```
API dokümantasyonu: http://localhost:8000/docs

### 5. API Kullanımı
```bash
# Türkçe TTS
curl -X POST "http://localhost:8000/tts/synthesize" \
  -H "Content-Type: application/json" \
  -d '{"text": "Merhaba dünya!", "engine": "gtts", "language": "tr"}'

# Çoklu metin synthesis
curl -X POST "http://localhost:8000/tts/batch" \
  -H "Content-Type: application/json" \
  -d '{"texts": ["Merhaba", "Nasılsın", "İyi günler"], "language": "tr"}'
```

### 6. Docker ile Çalıştırma
```bash
# Image build
docker build -t tts-app .

# Container run
docker run -p 8000:8000 tts-app

# Streamlit demo için
docker run -p 8501:8501 tts-app streamlit run app.py --server.address 0.0.0.0
```

## 📊 Dataset

- **İngilizce**: LJ Speech Dataset
- **Türkçe**: Mozilla Common Voice TR
- **Çok dilli**: Mozilla Common Voice

## 🏗️ Modeller

### Baseline
- gTTS (Google TTS)
- pyttsx3 (Offline TTS)

### Custom Models
- Tacotron 2 → Text to Mel-spectrogram
- HiFi-GAN → Mel-spectrogram to Audio
- FastSpeech 2 → Faster inference

## 📈 Değerlendirme Metrikleri

- **Objective**: MCD, SNR
- **Subjective**: MOS (Mean Opinion Score)

## 🛠️ Teknoloji Stack

- **Modelleme**: PyTorch, Coqui TTS
- **Veri İşleme**: librosa, phonemizer
- **API**: FastAPI
- **Demo**: Streamlit
- **Deployment**: Docker, HuggingFace Spaces

## 📋 Roadmap

- [x] Proje yapısı oluşturma
- [x] Baseline TTS implementasyonu (gTTS + pyttsx3)
- [x] API geliştirme (FastAPI)
- [x] Demo arayüzü (Streamlit)
- [x] Docker konfigürasyonu
- [ ] Dataset hazırlığı (Mozilla Common Voice TR)
- [ ] Custom model training (Tacotron 2 + HiFi-GAN)
- [ ] Evaluation metrikleri (MCD, SNR, MOS)
- [ ] Deployment (HuggingFace Spaces / AWS)
- [ ] Gelişmiş özellikler (Ses klonlama, çok dilli TTS)

## 📊 Proje İstatistikleri

- **Desteklenen Diller**: Türkçe, İngilizce, Fransızca, Almanca, İspanyolca
- **TTS Engine'leri**: gTTS (online), pyttsx3 (offline)
- **API Endpoints**: 7 endpoint (synthesis, batch, download, vb.)
- **Demo Özellikleri**: Web arayüzü, ses oynatıcı, dosya indirme
- **Deployment**: Docker ready, FastAPI + Streamlit

## 📈 Performans

### Baseline Performance
- **gTTS**: ~2-3 saniye/cümle (internet bağımlı)
- **pyttsx3**: ~1-2 saniye/cümle (offline)
- **Desteklenen Karakter**: Max 1000 karakter/istek
- **Çıktı Format**: WAV (16kHz, mono)

## 🔄 Güncel Durum (v1.0.0)

✅ **Tamamlanan**:
- Baseline TTS sistemi
- REST API (FastAPI)
- Web demo (Streamlit)  
- Docker konfigürasyonu
- Çoklu dil desteği

🚧 **Devam Eden**:
- Custom model training pipeline
- Dataset preprocessing
- Evaluation metrikleri

🎯 **Planlanacak**:
- Ses klonlama
- Real-time streaming
- HuggingFace Spaces deploy

## 🤝 Katkıda Bulunma

1. Fork the repository
2. Create feature branch
3. Commit changes
4. Create Pull Request

## 📄 Lisans

MIT License

Quick Setup & Commands

Clone Repository

HTTPS

git clone https://github.com/canuzlas/tts-w-gTTS-pyttsx3.git

SSH

git clone git@github.com:canuzlas/tts-w-gTTS-pyttsx3.git

Essential Commands

Navigate to project

cd tts-w-gTTS-pyttsx3

Install dependencies

pip install -r requirements.txt

Run application

python main.py

tts-w-gTTS-pyttsx3

Repository Details

README.md

Quick Setup & Commands

Clone Repository

Essential Commands

Related Repositories

gTts_n8n_telegram_bot

smilar-image-colorization--pytorch

voice-emotion-recognition-system

mail_spam_finder_ML

🚀 Ücretsiz AI Danışmanlığı