This repository implements a deep learning-based voice number authentication system using CNN and a Siamese Network. It verifies spoken numbers by comparing voice embeddings to reference samples. The model extracts audio features (MFCC, spectrogram) using CNN and determines similarity through a Siamese architecture.
Use the kaggle dataset for this .Mainly do the compares the two audio file in same person same number different audio to training the model.
-
Audio Preprocessing: Extracts MFCCs, Waveforms from numerical audio inputs.
-
Siamese Network Architecture: Uses a twin neural network to compute the similarity between two audio samples.
-
Triplet Loss / Contrastive Loss: Optimized for better feature embedding and verification accuracy.
-
Dataset Handling: Supports labeled audio datasets with numerical recordings given by kaggle.datset link
-
Training & Evaluation: Implements robust training with data augmentation and real-time validation.
-
Inference & Authentication:This uses two input audios that may or may not match the two audio files.
-
Deployment Ready: Can be integrated into real-world authentication systems.
1.Clone the repository:
git clone https://github.com/KaushiML3/Numerical-audio-authentication-system_Deep-learning.git
cd Numerical-Audio-Authentication-System
2.Install dependencies:
pip install -r requirements.txt
3.Run inference
- change the direction for API folder
python main.py