Kepuska, Veton

Please reference the Faculty Profile Editing Guide if you have any questions or issues updating your profile. If you receive any error notices please contact webservices@fit.edu.

Emeritus Faculty | College of Engineering and Science: Department of Electrical Engineering and Computer Science

vkepuska@fit.edu

Personal Overview

My goal is to make a significant contribution in advancing Human - Machine Interaction and Communication through my Wake-Up-Word (WUW) Speech Recognition (SR) Technology. Conventional Speech Recognition Systems typically operate at their best within the range of 99% accuracy. This implies that for the natural rate of conversation of a human speech the person who utters 100 words per minute, then we are expected to have at least 1 (one) error per minute. My research has shown that WUW SR will make 1 (one) error per 3 hours!

Educational Background

1990	Ph.D.	Computer Engineering Clemson University
	Dissertation	Artificial Neural Networks for Speech Recognition Applications
	Advisor	John N. Gowdy
1986	*M.S.*	Computer Engineering Clemson University
1986	Advisor	John N. Gowdy
1981	*Dipl. Eng.*	Electrical Engineering University of Prishtina
	*Thesis*	The use of the Analog Computers for Simulation and Automatic Control
	Advisor	Abdurrahman Grapci
1976	*Diploma*	Mathematical Gymnasium
	*Diploma Work*	Experimental Methods for Measurements of the Speed of Light
	Advisor	Skender Skenderi

Professional Experience

2003 - Present	*Florida Institute of Technology, Electrical and Computer Engineering – Associate Professor:* *Accomplishments:* Developed a web portal with my graduate students for TRDA of Melbourne in collaboration with Nterspec. PPT Commander – A Voice Only Activated Power Point Presentation Application Ported PPT Commander to Apple Mac OS Developed Voice Activated Elevator Simulator: http://www.youtube.com/watch?v=j5CeVtQMvK0 http://www.youtube.com/watch?v=OQ8eyBTbS_E Developed a Nursing Call Station Voice Only Activated interface for patients. Researching for ways to extend its capability for the patient to control its bed, TV and other devices connected to the system. Developed a Voice Activated Car Inspection System prototype for BMW: http://files.me.com/hardcaseron/l3byyd.mov Designed and Developed a High Speed Currency Bill Reader system using Embedded Hardware. Organized and Hosted NIST Rich Transcription Evaluation Workshop, Hosted and Participated in International "NIST Rich Transcription Evaluation" 2009 First Place in the First Annual Analog Devices & University of Massachusetts DSP Contest 2005 (Brian Ramos and Don McMann), http://faculty.uml.edu/Mufeed_Mahd/UML_ADI/photo_fit.htm), Third Place in IEEE SouthEastCon 2007 Student Hardware Competition: Basketball Robot among 38 Universities (Ronald Ramdhan, Xerxes Beharry & Sean Powers). http://www.southeastcon.org/2007/students/. The robot is displayed in Deans Conference Room. Best Paper Nomination " 2006-472: A MATLAB TOOL FOR SPEECH PROCESSING, ANALYSIS AND RECOGNITION: SAR-LAB" ASEE 2006 (undergraduate co-authors Rogers N., Patel M.), - Visual Audio - (Brandon Schmitt). - "Smart Room" Senior Design 2008. (Matt Hopkins, David Herndon, Patrick Marinelli). *AWARDS:* *FaST - Program Calculate Potential Energy Savings-from Using Mobile Smart Technologies. http://science.energy.gov/wdts/fast/project-descriptions/2011-projects/epa-calculate-potential-energy-savings-from-using-mobile-smart-technologies/, 2011 Kerry Bruce Clark Teacher, 2009 UML-ADI Assistive Device Competition,* June 2005 *Lowell* MA, First Place. Developed and Ported Wiener Based Noise Removal Algorithm to Analog Devices ADDS 21161 DSP. *Notable Presentations:* *CS Dept. Curriculum Series Presentation, 2005: “Wake-Up-Word Speech Recognition: A Missing Link toward Natural Language Understanding”. NSF Proposals – PI :* Written over 20 NSF proposal. *NSF Proposal – Co-PI:* Participated in over 15 NSF proposal.
2001 - 2003	*Speech Recognition Scientist -* ThinkEngine Networks, Inc., 175 Maple Street, Marlborough, MA 01745. USA. Invented, Designed and Developed unique solution to “*Wake-Up-Word” or “OnWord™” Spotting* Technology. Wake-Up-Word Spotting entails recognition of a specific word/phrase uttered in isolation or in a context of a continuous speech. Currently this technology is not as widely used as other Speech Recognition Applications/Tasks because of poor performance of Speech Recognition Systems offering such technology commercially - Nuance, SpeechWorks, Philips, Conversay, ART, etc., or as a research tool, that is speech recognition technologies of primarily research and development institutions such as – Byblos (BBN), Sphinx (CMU), HTK Speech Recognition Tool Kit (Cambridge University, Entropic and Microsoft), etc. Furthermore, all those systems require computer systems with powerful CPU’s (~1.5 GHz Pentiums) with large memory (512 Mbytes RAM) with Speech Recognition process itself requiring tens of hundreds of Mbytes for this feature alone to even run in real time. Additional advantage of the developed system is that it is designed also to run on a Fixed Point DSP, requiring less than 36.2 Kbytes of program memory space and 2 Kbytes for Model space, consuming less 2 Million Cycles per Second on a TI C62xx. Inventor of 3 Patented Solutions – Patent Pending: Working on Generalized scoring using Reversed and Normal Ordered Features for any Pattern Matching Method (e.g, DTW, HMM) to be filed for patent. Designed and Assisted in Developed of Voice Data Collection System - necessary for research, development, testing and evaluation of the Wake-Up-Word Recognition System. Performed and Managed 2 data collections over various calling environments (noisy, quiet, public, car, etc.) using various calling devices (cellular, landline, speaker phone. Created 2 Corpora from the recorded data. Those Corpora are used for: Transcribed and/or Supervised transcription process of recorded data. Set up conventions and standards so that all the tools to be developed that use data of created Corpora comply with a clear set of standards. Converted other (CallHome and PhoneBook) Corpora to this set of standards for easy and consistent use. Directed and Supervised Code Conversion and Porting from Floating Point to Fixed Point of Wake-Up-Word Spotting Technology. Developed Automated Process using combination of perl scripts and perl configuration files controlling various parameters affecting each step of the complex process of: Voice Activity Detection Based on Cepstral Features. Dynamic Time Warping (DTW) Matching using Reverse Ordered Feature Vectors. Rescoring using Distribution Distortion Measurements of Dynamic Time Warping Match. Building Models of a particular Wake-Up-Word Testing and Evaluation of the System, and Research, Development, and Refinement of Wake-Up-Word Recognition System, Generating Features from a Voice Data Corpus, Building a Model of a Wake-Up-Word (e.g., “Operator”, “Help”, “MapQuest”, “Verizon”, etc.) from the features, Using built Model to test and evaluate Wake-Up-Word Recognition System, Generating Performance Plots, Charts and Graphs. Those scripts use numerous executables, gnuplot – a graph plotting tool, as well as other perl scripts. End result of this process is automatic generation of number of plots, charts, and graphs that depict performance of the system for easy evaluation and comparison. Trained and Supervised a DSP engineer to port, test and evaluate Wake-Up-Word Technology. Worked with Application Developers to integrate Wake-Up-Word Spotting Technology into a viable Demo and potentially viable product. Wrote Technical Document and Manual for this Technology. Consulted CTO in decision making process regarding Speech Recognition, Text to Speech, as well as Wake-Up-Words Spotting Technologies.
1999 – 2001	*Speech Recognition Scientist* – SpeechWorks International, Inc., Product Group, 695 Atlantic Ave., Boston, MA 02111. USA. Developed Noise Compensation Algorithm to increase recognition robustness against Noise and Channel varying characteristics. Conducted Study of Wireless/Cellular vs. Wireline/Landline signal differences and their effect on recognition performance. Developed Nonlinear Front End Signal Processing. Performed Comparative Studies of various Speech Recognition Technologies (e.g., AT&T, NUANCE, SPEECHWORKS recognizers). Developed algorithms to investigate various features (confidence score, acoustic score, etc.) and their optimal use for combining N-best lists produced by different features (mfcc, lpc, etc.) and different recognizers (segmental, HMM, Watson). Combining algorithm achieved significant error reduction as compared to the best. Developed diphone clustering for HMM models to minimize model size. Involved in re-alignment of acoustic segments for Text to Speech (TTS) model building data. Developed frame work for modular expansion and refinement of re-alignment process using perl scripts combined with perl configuration files. Implemented various heuristic rules to improve alignments generated by the Speech Recognizer to better fit TTS. Developed data collection program for Dialogic JCT board that supports CSP. Developed, Run, Digested, Processed, “Call Environment Data Collection” using this application.
1997 - 1999	*Scientist* - GTE, BBN Technologies, Speech Solutions Group, 70 Fawcett St., Cambridge, MA 02138. USA. Compiled and Analyzed BYBLOS (research speech recognition technology) and BBN HARK (commercial technology) system differences; Analyzed possible BYBLOS technologies for porting into BBN HARK; Developed and Coded Voice Model Filter that loads BYBLOS and/or BBN HARK training files and converts them into a new format files in compliance to designed specifications; Ran various tests (BYBLOS and BBN HARK) for Continuous Densities BBN HARK for benchmarking. Peer reviewed a paper for Speech Communication Journal.
1993 - 1997	*Speech Scientist* – Voice Processing Corporation/Voice Control Systems, Advanced Technology Development Group,One Main Street,MA02142.USA. Enhanced the performance of existing Front End of Speech Recognition System, implemented in VPro line of products, by designing a non-linear smoothing algorithm based on median filtering. Developed and Implemented Dynamic Features that augmented existing Front End Features. Developed a universal preprocessing module of the Front End that enables run-time front-end configuration, decompression, and sample-rate transformations of the original wave file. Performed numerous tests that provided critical insights into enhancement and debugging of VProFlex Technology. Invented, Developed and Integrated a very efficient novel Code Book Search strategy (internally named Fickle Search). Compiled a condensed Internal Report of the Literature Review Study on different ways to perform fast FFT’s of a real valued sequence. Developed, Tested, and Integrated Split Radix FFT algorithm. The function can handle any power of 2 Real Valued FFT’s. Modified Front End to take advantage of higher FFT size and increased frequency resolution: ¨ Analyzed the conflicting effect of window size and type (higher frequency resolution causing break down of enhancement due to harmonics, ¨ Analyzed several possible modifications of enhancement algorithm to accommodate higher frequency resolution, and ¨ Proposed elimination of pitch harmonics from the spectrum with Homomorphic filtering or LPC - based Spectrum. Implemented LPC based spectrum integrating it with existing Spectral Enhancement module of the Front End. Initiated the study toward enhanced composition of boundary and internal acoustic phonemic features. Invented, Developed, Ported, and extensively Tested a novel Noise Compensation with Speech Enhancement Algorithm. Also invented several integration strategies that take further advantage of the algorithm through a better interaction of the Front End with API. that take advantage of calibration when feasible. Default mode of operation is fully unsupervised in real-time. Developed and ANN software tool currently supporting five different feed-forward back-propagation type of learning. Developed a Pitch Tracking Algorithm based on enhanced Super-Resolution Pitch Determination Algorithm.
1990 – 1993	*Post-Doctoral Research Associate* - Swiss Federal Institute of Technology, IGP, ETH-Hönggerberg, CH-8093Zürich,Switzerland. Swiss National Science Foundation Research Project in Image Understanding - Design and Analysis of Spatial Image Sequences
1985 – 1990	*Teaching Assistant* – Electrical and Computer Engineering Department.Clemson University. Digital Processing of the Speech Signals, Digital Systems, Digital Circuit Design and Microprocessor Applications, Electronics, Programming.
1987 - 1990	*Consultant* - Engineering Research and Computer Services Department, Clemson University, Electrical and Computer Engineering Department,Clemson, SC29634-0915.USA. Design and Development of a database system for processing of the expenditures of the College of Engineering, Clemson University. Design and Development of a database system prototype for automation of: ¨ Management of the repair and maintenance orders, ¨ Task allocation and duty assignment, ¨ Time-table management of the assigned personnel, and ¨ Generation of relevant statistical data.

1985 - 1986
Summer Job

Software Engineer - Keiltronix: Textile Control Systems Inc.2910 Horseshoe Lane, P.O. Box 1923, Charlotte, NC 28219.

Developed software for polling and analyzing data from peripheral machine controllers. Developed software for graphical display of status of a manufacturing dying process in real-time.
Development of Software Package using REGIS as a low-level software tool for dynamic display of the state of technological process in real-time.

1981 - 1984

Assistant Lecturer - Electrical Engineering Faculty,University of Prishtina, Republic of Kosova.

Taught courses in Control Theory, Systems Theory, Algorithms, Digital Communications, Boolean Algebra, Digital Systems, and Programming.
Contributed in the publishing of the first Automatic Control Theory text book in Albanian Language.
Key member of the commission that prepared a detailed proposal for Advancement of Curricula of Electrical and Electronics Engineering Faculty.

Selected Publications

PATENTS:

Dynamic Time Warping (DTW) Using Frequency Distributed Distance Measures: 6983246, January 3, 2006.
Scoring and Rescoring Dynamic Time Warping of Speech: 7085717, April 1, 2006.
Exploiting Differences in Correlations for Modeled and Un-Modeled Sequences by Transforming Trained Model Topology in Sequence Recognition: Provisional Patent Application, August 2009

BOOK CHAPTER

Këpuska, V "Wake-Up-Word Speech Recognition", Speech Technologies /Book 1, Intech, ISBN 978-953-307-152-7, February 2011.

JOURNAL PUBLICATIONS

Këpuska, V. et al. (2012). Energy Savings from using Mobile Smart Technologies, Journal of Renewable and Sustainable Energy, Submitted 2012
Këpuska, V., Xerxes, B., & Powers, S (2011) Phoning Home: Bridging the Gap between Conservation and Convenience", JSTEM, 2012.
Këpuska, V, & Rojanasthien, P. (2011) Speech Corpus Generation from DVDs of Movies and TV Series, JITIM, 2011-2012
Këpuska, V (2010). Wake-Up-Word Recognition. SPIE Newsroom, Oct 6 2010. DOI: 10.1117/2.1201009.003154 http://spie.org/x42008.xml?ArticleID=x42008
Rodriguez, W., Fiore, S., De Welde, K., Carstens, D., Këpuska, V. (2010). Ubiquitous Collaboration (uC) Learning, Ubiquitous Learning: Journal of International Technology and Information Management.
Këpuska, V., & Klein, T. (2009). On Wake-Up-Word Speech Recognition Task, Technology, and Evaluation. Elsevier Journal of Nonlinear Analysis.
Këpuska, V., Gurbuz, S., Rodriguez, W., Fiore, S., Carstens, D., Converse, P., Metcalf, D. (2009). uC: Ubiquitous Collaboration Platform for Multimodal Team Interaction Support, Submitted to Journal of International Technology and Information Management (IJTIM), Invited Paper Special Issue on Knowledge Management and Business Intelligence
Këpuska, V. and Mason. S., (1995). A Neural Network Approach to Signalized Point Recognition in Aerial Photographs, Photogrammetric Engineering & Remote Sensing, Vol. 61, No. 7, pp. 917-925, July 1995.
Mason, S. and Këpuska, V., (1992). CONSENS: An Expert System for Photogrammetric Network Design, Allgemaine Vermessungs Nachrichten, pp. 384-393, September 1992.

CONFERENCE PUBLICATIONS:

Këpuska, V. (2012). Elevator Simulator, IEEE-ESPA, Las Vegas, 2012
Këpuska, V., & Shih, C. (2010). Prosodic Analysis of Alerting and Referential Contexts of Sentinel Words. International Conference on Artificial Intelligence and Pattern Recognition (AIPR'10), Orlando, Florida, 2010
Këpuska, V., & Klein, T. (2008). On Wake-Up-Word Speech Recognition Task, Technology, and Evaluation Results against HTK and Microsoft SDK 5.1. Invited Paper: World Congress on Nonlinear Analysts, Orlando 2008, To appear in Journal of Nonlinear Analysis, Theory, Methods & Applications.
Beharry, X., Këpuska, V., Powers, S., Ramdhan, R., Rojanasthien, P., Weerasooriya, A., (2008). Patriot Robotic System Design, Florida Conference on Recent Advances in Robotics, FCRAR 2008
Këpuska, V., Carstens, D. S., & Wallace, R. (2006). Leading and Trailing Silence in Wake-Up-Word Speech Recognition, Proceedings of the International Conference: Industry, Engineering & Management Systems 2006, Cocoa Beach, FL., 259-266.
Këpuska V., (2006). Wake-Up-Word Application for First Responder Communication Enhancement, SPIE,Orlando, 2006.
Këpuska V., Rogers N., Patel M., (2006). A MATLAB Tool for Speech Analysis, Processing and Recognition: SAR-LAB, ASEE, Chicago, 2006.
Kasza T., Shahsavari M., Këpuska V., Chen Ch., (2006). Communications Protocol for RF-based Indoor Wireless Localization Systems, SPIE,Orlando, 2006.
Anagnostopoulos G., Georgiopoulos M., Ports K., Richie S., White M., Këpuska V., Chan P. K., Wu A., Kysilka M., (2006). Engaging Undergraduate Students in Machine Learning Research: Progress, Experiences and Achievements of Project EMD-MLR, Proceedings of the ASEE 2006 Annual Conference and Exposition, June 18-21, Chicago, Illinois.
Anagnostopoulos G., Georgiopoulos M., Ports K., Richie S., Cardinale N., White M., Këpuska V., Chan P., Wu A., Kysilka M., (2005). Project EMD-MLR: Educational Material Development and Research in Machine Learning for Undergraduate Students, Session 3232, Proceedings of the ASEE 2005 Annual Conference and Exposition, June 12-15, Portland, Oregon.
Mason, S. and Këpuska, V., (1992). On the Representation of Close-Range Network Design Knowledge, XVII ISPRS Congress,Washington D.C., August 1992.
Këpuska, V. and Mason, S., (1991). Automatic Signalized Point Recognition with Feed-Forward Neural Network, IEE Second International conference on Artificial Neural Networks, Bournemouth, U.K., November, 1991.
Mason, S., Beyer, H., and Këpuska, V., (1991). An AI-based Photogrammetric Network Design System, First Australian Photogrammetric Conference,University of Newcastle,Australia, November 1991.
Këpuska, V. and Mason, S., (1991). Artificial Neural Network Approach to Signalized Point Recognition in Aerial Photographs, First Australian Photogrammetric Conference, University of Newcastle, Australia, November 1991.
Këpuska, V., Beyer, H. and Mason, S., (1991). Artificial Neural Networks for Calibration of CCD-Cameras, Workshop on Industrial Applications of Neural Networks, Ascona, Switzerland, September 1991.
Këpuska, V. and Gowdy, J., (1990). On the Effect of Topological Structure of the Kohonen Network on the Performance of the Hierarchical two Layered Isolated Word Recognition System, IEEE Southeastcon Symposium, New Orleans, April 1990.
Këpuska, V. and Gowdy, J., (1989). Investigation of Phonemic Context in Speech using Self-Organizing Feature Maps, IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP’89, Glasgow, Scotland, May 1989.
Këpuska, V. and Gowdy, J., (1989). Phonemic Speech Recognition Based on Neural Network, IEEE Southeastcon Symposium, Columbia, April 1989.
Këpuska, V. and Gowdy, J., (1988). The Kohonen Net for Speaker Dependent Isolated Word Recognition, IEEE Southeastern Symposium on Systems Theory, UNCC Charlotte, March 1988.
Këpuska, V. and Gowdy, J., (1987). Evaluation of Digital Signal Processing Chips for Speech Processing Applications, IEEE Southeastern Symposium on Systems Theory, Clemson University, Clemson, March 1987.
Këpuska, V. and Gacaferri, J., (1979). The Determination of the Polynomial Coefficients for Approximation of the EKG with Computer, (in Serbo-Croatian), Symposium JUREMA, Zagreb 1979.
Këpuska, V. and Mason. S., (1992) NFP23: Design and Analysis of Spatial Image Sequences, Wissentsschaflicher Bericht zum Schweizerischer Nationalfonds zer Förderung der Wissentsschaftlicher Forschung, 1992.
Këpuska, V. and Mason, S., (1992) Design and Analysis of Spatial Image Sequences, NFP 23 Third Annual Status Report,Bern,July 6, 1992.
Këpuska, V. and Mason. S., (1991) NFP23: Design and Analysis of Spatial Image Sequences, Wissentsschaflicher Bericht zum Schweizerischer Nationalfonds zer Förderung der Wissentsschaftlicher Forschung, 1991.
Këpuska, V. and Mason, S., (1991) Design and Analysis of Spatial Image Sequences, NFP 23 Second Annual Status Report,Bern,June 5, 1992.
Mason, S. and Këpuska, V.,(1991) NFP 23: Design and Analysis of Spatial Image Sequences (Project Summary), SGAICO Newsletter, Swiss Group for Artificial Intelligence and Cognitive Science, 1991.

Recognition & Awards

2011	FaST - Calculate Potential Energy Savings-from Using Mobile Smart Technologies. http://science.energy.gov/wdts/fast/project-descriptions/2011-projects/epa-calculate-potential-energy-savings-from-using-mobile-smart-technologies/
2008 - 2009	Kerry Bruce Clark Teacher
2008	Greatest Commercial Potential - "Smart Room" Senior Design 2008.
2007	Third Place in IEEE SouthEastCo. Student Hardware Competition: Basketball Robot
2007	Best Junior Design 2007 - Visual Audio
2006	Best Paper Nomination " 2006-472: A MATLAB TOOL FOR SPEECH PROCESSING, ANALYSIS AND RECOGNITION: SAR-LAB"
2005	UML-ADI Assistive Device Competition, June 2005, University of Massachusetts Lowell MA, First Plac http://faculty.uml.edu/Mufeed_Mahd/UML_ADI/photo_fit.htm
1984 – 1985	Fulbright Fellow
1987 – 1988	Harris Fellow
1977 – 1979	Univeristy of Prishtina Fellow