AudioImager: April 2010

About You

I am a final year master student following Masters in Information Engineering at Faculty of Technology, University of Oulu, Finland. Before commencing my master’s studies, I completed a 4 year bachelor’s degree in Computer Science and Engineering at Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. You can find my LinkedIn profile at http://fi.linkedin.com/pub/kumaripaba-athukorala/20/319/9b4

I have involved in several projects during my bachelor studies and master studies.

The most recent project I involved in was the implementation of a money order handling system. I got involved in this project while I was working at the department of Computer Science and Engineering, University of Moratuwa. It is a web application developed in LAMP. Here I involved in the development of the server side scripts (in PHP) and design of the database. This project was a great success and currently deployed and used by the government.

I was an employee of Codegen Internation (http://www.codegen.net/), Sri Lanka for a period 6 months and during this period I contributed to the development of a module named Tour Package Builder, for the project TravelBox (in Java).

During the final year of my bachelor’s studies I was engaged in the development of a Home Automation and Monitoring System for Mobile Devices. This was a one year long research project conducted by three members as a requirement of the bachelor’s program. In this project I contributed to the development of the Home Configuration tool (in Java), design of the database (in mysql), server side scripting (in PHP). (See project Video: http://dsmartdraw.blogspot.com/). The research conducted for this project was published at IEEE conference NGMAST’09.

During my undergraduate period I went through industrial training for a period of 9 months at a local IT firm, where I involved in two different projects. In the first project my responsibility was to contribute to the development team of the project which developed a VoIP call center solution based on Skype API. Technologies such as ASP.net, MS-SQL 2000,.Net charts were used in developing this system. My second project was SuperOffice. Super Office is a CRM widely used in European companies. In this project, my responsibility was to develop the API document for the developers.

I decided to participate in Google Summer of Code-2010 due to several reasons. The first reason is that I need to spend this summer in a very productive manner, therefore I realized that engaging in GSOC is the best mode of occupying my summer. And also I like to contribute to the open source community and this is one of the best opportunities for that. And also for my masters program, it is required for me to undergo at least two months of fulltime industrial training and participation in GSoc is also accepted as industrial training.

I specifically selected a Berkman project because it is research oriented. I am more interested in researching and my area of specialization is intelligent systems. Therefore I was searching for a research oriented project related to my area of interest and the project AudioImager perfectly matches this criteria. After this summer I am planning to commence my master’s thesis, therefore I believe that the experience I am going to gain from this project will help me a lot in finding a topic and a position for my thesis.

I have thoroughly reviewed the important dates and times of GSOC 2010 and I do not have any significant conflicts with the GSOC 2010 schedule. I do understand that this is a serious commitment and I am anyway looking for a similar commitment.

Proposal

Overview

The main requirements of the project AudioImager is to implement a software which automatically selects suitable images for an audio and generates a video for this audio. In order to provide this major requirement this software needs the following basic components.

1. Interface for the user to provide the audio file

2. Lyrics search engine to find the lyrics of the audio.

3. Image search engine to find suitable images for the audio

4. Place lyrics, images and audio in the timeline.

5. Provide an interface for the user to edit the lyrics, change the images, test the video

I propose this system to be implemented in two main phases. The first phase will basically involve finding lyrics of the audio and bringing it to a standard format. Hence the component number 1 and 2 will be implemented by the end of this phase.

In the second phase the lyrics will be broken in to meaningful phrases, images are searched for these phrases and they are provided in a timeline. This mainly involves the creation of a user-friendly interface which lets the user remove the software suggested images and insert user’s own images and changing the lyrics. The user can also change the duration in which a particular image is shown. Hence by the end of this phase the rest of the components will be implemented.

Testing will be carried out throughout the implementation process. I would like to implement this project in Java. Java Media Framework (JMF) can be utilized in this project. The first version of this project will be implemented as a desktop application.

Detailed

I propose this application to be developed in two main phases. The major goals of the first phase are given below.

1. Provide an interface for the user to input the audio file and select the intended use of the video.

2. Search for lyrics of the audio file

a. I propose this to be implemented by using the information in the header of the audio file. Most of the files contain details about the file in the header. These details include information such the title of the file, author of the file and etc. This information will be very helpful in this phase. But the header formats vary from file format to format. To make it possible to extract the headers from every format, it would require to read many specks and this would consume a lot of time. Therefore I’m planning to implement header reading only for one popular file format and for other formats, the application will prompt the user to enter the title and the author information.

b. The earlier suggestion for this project was to extract the lyrics through speech recognition. But after researching about this, it is found that most of the open source voice recognition software isn’t very accurate. And also they require creation of an acoustic model using former voice cuts of the speaker. This would not be feasible in the given timeframe. Therefore it is the best to get the lyrics by searching through the web. For this I am planning to use web scraping.

3. Bring the lyrics to a standard format.

a. Generally the lyrics can have various formats. But here our concern is on the key words or meaningful phrases in the lyrics. To find them we have to first fill all the gaps or missing information in the lyrics and bring it to a standard format.

The above three tasks end the first phase. In the second phase the following goals will be achieved.

1. Break the lyrics in to meaningful phrases (keywords).

a. This requires little bit of research. I suggest to break the lyrics initially in to chunks of N words. Then search for an image for each chunk. If there’s no image for that given chunk, remove word by word from the chunk until you receive an image. This is one possible method of doing this. But I am planning to allocate sometime to research on this first.

2. Search for suitable images for the keywords.

a. After breaking the keywords we can select the image results for that phrase. Images that permit the intended use of the video will only be selected.

3. Implement the user interface.

a. It provides the images in a timeline.

b. User can add his/her own images too. Remove the software selected images. Change the duration in which the image is displayed.

c. Similarly the user can change the keywords (Lyrics). Can add new keywords, delete existing keywords.

d. User can preview video.

4. Testing the application.

Project Plan

Plan for First Trimester: 26 April – 23 May

I will spend this period on researching on following areas.

1. Study the header format of a popular audio file format

2. Research on web scraping

3. Work with Java Media Framework (JMF) and research on how it supports the development of this project

4. Research on breaking the lyrics in to meaningful phrases.

Plan for Second Trimester: 24 May – 12 July

I will spend this period on implementing the first phase of the project.

1. Implement a simple interface for the user to enter:

a. The audio file

b. Select the intended use of the video from a drop down list.

2. Check the format of the file and prompt the user to enter the author name and the title if it is not of the given format.

3. Implement the lyrics search engine.

a. It will search for lyrics using web scraping.

4. Bring the lyrics to a standard format.

5. Break the lyrics to meaningful phrases (keywords).

Plan for Third Trimester: 13 July – 9 August

I will spend this period on implementing the second phase of the project.

1. Obtain images matching to these phrases.

2. Design the main interface with following features.

a. Area to display the images in a timeline

b. Audio timeline

c. Lyrics in a timeline

d. Display additional images for the user to select from in a separate area.

e. Provide user the facility to browse an image.

f. Preview the video.

g. Save video.

h. Provide the user the facility to edit/remove/insert a keyword.

3. Implement the features in the interface.

4. Testing.

AudioImager

Tuesday, April 27, 2010

My Proposal for AudioImager got accepted to GSoc 2010

Friday, April 9, 2010

Testing with Java Media Framework

Thursday, April 8, 2010

Proposed UI for Editing

Proposal

Download Xuggle

Download Java