Msc Thesis

This MSc thesis for Trinity Collge Dublin implements building recognition on mobile devices (Android powered) using captured camera frames, without the need of a network connection to a remote server. A visual feature database used for recognition is stored in the device, requiring it be optimized to reduce the dimension. This optimization is obtained using clustering and genethic algorithms.

The research covers the design and development of a framework for the creation of small visual features database. This database is to be used on mobile devices to perform building recognition on a self-contained "Tell me what I am looking at" application using two inputs: GPS data and camera images.

The main contribution of this approach is exploring the automated creation of a compact local visual features database to be installed on the mobile device. Using a local database is justified by scenarios where a data connection to a remote server is not available or too expensive (e.g. tourists using data
roaming abroad).

Creating a compact database requires a balance between various constraints. The number of visual features in the database will affects both the size of the database on the limited storage of a mobile platform and the computation time of the image matching. However, having a small number of features in the database also results in poor results. This research evaluates the use of a genetic algorithm that selects the best parameters to build the database using visual features clustering.

Download the full text version.

Trinity Inclusive Curriculum Online Tool

The Trinity Inclusive Curriculum (TIC) Online tool is a web applicationdeveloped for the Disability Service of Trinity College Dublin, Ireland. I designed and developed this application as a freelancer. The aim of the application is to help teachers and staff at the University of Dublin in evaluating the accessibility of their courses and practices. This is achieved by providing them with questionnaires to fill in that will result in a suggestion report.

In this context, accessibility of the application itself is paramount. The application is compliant with the W3C's Web Accessibility Initiative (double A) guidelines. It has been developed and tested with third party commercial screen readers in mind.

The application has two main interface: the final user interface and the editor interface. In the editor interface authorized editors can publish and modify the questionnaires, divided in sections and question. Each section or question can be enriched by a sample video, hosted on YouTube but with an additional text transcript available in the tool in order to allow screen readers the fruition of this otherwise unreachable content (at the time of writing, YouTube lacks plain text transcript support).

The final user interface allows users to start evaluations, invite other user to complete a joint evaluation and generate reports. Evaluations and reports are available both for printing and downloading as MS World documents.

On the technical side, the application runs on the classic PHP and MySQL platform, using the CodeIgniter and jQuery frameworks. The application is validated in both XHTML and CSS.

Prey And Predator

A behavioral animation simulating a prey and predator in a marine environment, developed as part of university coursework. The AI of the fish includes two senses (sight and hearing) and a knowledge representation based on perceptions. The demo also contains a demo mode where the player controls a cod and try to avid the shark. Coded in XNA/C# with a basic skeletal animation done in Blender.

Download a detailed description of this demo

In this demo I wanted to investigate the result of a combination of a sensor system and an awareness model to mimic the prey-predator behavior. The simulation takes into account vision, which is relatively limited in an underwater environment, and mechanoperception provided by the lateral line of the fish, that can detect vibration and hence movement and sounds. This last sense can extend its range farther than sight and is used by predators to identify possible prey before actually spotting them.

This demo doesn't focus on an accurate physical simulation or animation but implements a simple scenario where a prey will try to escape an attacking predator. The success of this action will depend on the ability of the prey to detect the predator before it attacks,. The initial positions of prey and predator are randomly chosen and the final outcome is not predefined.

The detection of a predator is based on a finite state machine model to take that takes into account the awareness level of the prey. This awareness level will influence the future perceptions and actions of the prey, e.g. if the prey detects something unusual in an area, it will not approach that area even if it cannot clearly detect a predator.

The 3D models of shark and fish used are taken from Tuocan virtual museum and I added a very rudimental skeletal animation for swimming/biting using Blender 3D.

Main features

  • Animated 3D models in a marine environment
  • Multi-agent environment, with agents modeled as finite state machine
  • Sensors have field of "view" with different resolution
  • Stimuli generated by the sensors build the agent knowledge base that will influence the behaviour
  • Various factors affect sensors: distance, angle, speed, obstacles
  • Different scenarios (1 vs. 1, 1 vs. many, player vs. computer)

Wiicam Block

A Tetris-fashioned proof-of-concept game for PC using a webcam and a wiimote as input devices, developed as part of university coursework. Coded in C#/XNA and C++ (OpenCV).

This game relies on a webcamera and a Wii Remote controller (wiimote) to collect user input. The player interacts with the game holding the wiimote in her hand, moving and rotating the wiimote in front of the camera.

The goal of the game is to align blocks of the same colour falling from the top, in a tetris-like fashion. The player can change the column by moving the wiimote horizontally and can spin the blocks by rotating the wiimote. The game also has a built in camera calibration procedure and testing that allow the player to calibrate the camera using a printed check board.

The main features of the application are:

  • Input is provided via the camera (wiimote position - not by the wiimote classic tracker) and wiimote accelerometer
  • Coded in C#/XNA for the game part and C++/OpenCV for the camera input
  • Built-in camera calibration and debugging
  • All the graphic content has been created by myself

Steering Behaviour

A proof-of-concept of different steering behaviours applied to boids, as part of univeristy coursework. Coded in C++ using OpenSteer library.

In this demo I implemented different steering behaviours on top of the OpenSteer framework. The behaviours implemented are:

  • Arrival
  • Flow following
  • A* path following
  • Queueing

Road Detection

Proof-of-concept realtime road lane detection from a test video, coded in C++ using OpenCV, as part of university coursework. The demo overimpose on the original video an overlay of the detected lane and steering direction.

Chaaaaaarge

An augmented reality proof-of-concept game coded in C#/XNA using OpenCV as part of university coursework. The requirement for this project was to use two non-traditional input methods and I decided to use a webcam and a microphone.

This is a simple game that allow the user to interact with the application without using the traditional controls (keyboard, mouse, joypad). The interaction is based on a normal sheet of A4 paper lying on the desktop in front of the computer. The user can point positions on the sheet that are then mapped on the game world; furthermore, the user interacts with the application using a microphone.

By playing the game the player faces a minimalistic AI on a medieval-themed battleground. The goal of the game is to destroy the opponent's tower while defending your own. The player is not able to control directly the ten soldier that form his army. Instead, the player is able to pick the initial position where the soldiers will start from. Once the game starts, the soldiers will simply move towards the other end of the battlefield, fighting any enemy soldier found on the way, and finally attacking the enemy tower.

To make things more interesting, there is a destroyable obstacle in the middle of the field, and soldiers of the same army can collaborate by pushing friendly soldiers from behind. Since the two armies move one toward the other, with just a sprinkle of randomness, the soldiers on the first line will inevitably clash frontally. Soldiers from the back row will push the ones on the first line adding their force to the impact with the enemy. The player can therefore experiment different formations (the wedge formation has proven to be very useful during the testing).

After the battle start, the player can utter battle cries (even if any sound loud enough will do) that are picked up by the microphone. Doing so will boost the pushing force of his own soldiers. The very beginning of the battle is started by the player by shouting in the microphone (hence the name of the game).

Main features

  • Input is provided via the camera using a simple sheet of white paper and via microphone
  • Coded in C#/XNA for the game part, C++/OpenCV for the camera input and using Naudio
  • Built-in camera calibration and debugging
  • All the graphic content has been created by myself
  • Graphic is displayed on top of live video

3d Realtime Physic Simulation

A library for real time 3D particles and rigid bodies simulation in OpenGL, created as part of university coursework. In the simulation, particles and bodies respond to forces and are simulated using various integration methodsn.

This coursework project is implemented in C++/OpenGL and consists of a physics engine compiled in a DLL. The individual demos visible in the videos use the engine to perform the simulation, but they differ in the setup of the scene and in the functionalities of the engine they use.

Flags can be set on the physics engine to activate or deactivate some functionalities, like broad phase collision detection or the drawing of the bounding boxes. The individual demos doesn't do anything more than setting these flags accordingly to the goal of the specific demo and laying out the planes and the particles or cubes for the scene.

I wrote two versions of this engine. The first version was written with an object-oriented approach. A base class was used as a superclass for planes, particles and cubes and forces were also implemented as object with a class hierarchy. The overall idea was to delegate to the single instance the execution of all the integration, drawing and update. Custom made list (with some handy extra methods) were also used extensively to have a cleaner code. In this implementation, the mathematical library "vmath" has been used (and extended to implement the inversion operation on 3x3 matrix, only available on 4x4 matrix).

This design decision led to a good structured code, with a clean class structure, but the performance was not acceptable and there were some glitches with the plane response. In a second phase, the entire engine has been rewritten from scratch with a different approach. The mathematical library used is "WildMagic4". In this second version I decided to go back to a C style approach, with static functions acting on structs and arrays with a fixed size. The code is a bit more convoluted , but the performance have increased, for the particle simulation, by some ten times, and by some three times for the object simulation.

Main features

  • Euler, Midpoint and RK4 integration methods
  • Full 3D body implementation with inertial tensors
  • Particle and rigid body simulation
  • Visualisation of broad-phase (AABB with sweep and prune) body-to-body and narrow-phase body-to-plane collision detection
  • Body-to-plane collision response

Bsc Dissertation

This BSc dissertation project for University of Pisa implements a webpage speech synthesizer with voice command recognition, for visually impaired users. Created in collaboration with CNR/Italian W3C Office (Dr. Oreste Signore) and DotVocal, the academic supervisor was Dr. Roberto Bruni from the University of Pisa.

The project is composed of a Firefox plugin and a C++/Java local service. The service provides speech synthetis and voice recognition features, while the Firefox plugin is responsible for retrieving and parsing web pages and providing visual hints during the speech synthesis.