5 Things I Learned at The Huawei Ireland Research Video Summit 2017

IMG_3528.JPG

The Huawei Ireland Research Video Summit 2017 took place on the 21st November in Trinity College Dublin.

This event brought together a host of the most talented video analysis researchers from top European academic institutions and industry. Over the course of the day, talks were given highlighting the current achievements and challenges of current video analysis research.

The main areas of discussion included Recognition & Tracking, Behavior Analysis, and Security & Surveillance. It’s been difficult to choose but I have managed to compact what I learned into what I consider to be the 5 most noteworthy insights.

 

1. You Can Now House-Hunt From the Comfort of Your Own Home

pasted image 0.png

Professor Theo Gevers of the University of Amsterdam showed off a neat application of computer vision for both house-hunters and real estate agents alike.

A startup Gevers is involved with, ScanM, uses deep learning to create full 3D models of properties and thus allows you to go on virtual tours of houses across the globe without having to move a muscle! Similar technology can also be used to model how a room in your house would change if you were to re-furnish it in any particular style, allowing you to prospectively see the end result before going ahead with any physical furnishing alterations.

 

2. The Successful Rise of Online Streaming Services (e.g. Netflix) can be Credited to “Caring for the Pixels”

pasted image 0 (1).png

Head of the Electronic Engineering Department in Trinity College Dublin, Professor Anil Kokaram’s talk highlighted the significance video quality has on the type of streaming outlet the public opt for.

In particular he noted that from 2010 onwards, streaming services such as Netflix grew in popularity not just because of the on-demand database of TV shows and movies they offered, but also because of the cinematic-level video quality and denoising available on these platforms outperformed regular TV streaming. It was seen that a relatively small reduction in video quality (reducing signal-to-noise ratio by 4dB) caused viewers to stop watching Youtube videos, emphasising the role that caring for pixel quality plays in maintaining a viewership.  

 

3.  Computer Vision Can Help Your Hearing

Yes, you heard me correctly, coupling both audio and visual data has been shown to significantly improve the performance of speech recognition.

Professor Shmuel Peleg of The Hebrew University of Jerusalem showcased how computer vision can help with recognising what a particular person is saying in a noisy environment when there is additional visual data of that person’s facial articulations. Possible applications include adding this technology to a hearing aid setup and to process news reporting from the field, as in both of these situations it may be necessary to enhance/amplify the sound of one person speaking in particular over background noise.

 

4. Croke Park is a Miniature City, from an Internet of Things Perspective

pasted image 0 (3).png

Professor Noel E. O’Connor’s (Dublin City University) presentation focused on the future use of sensors including visual footage to allow authorities to better manage city-life, from crowd-control to potential flooding.

This will eventually be implemented in cities across the globe, however using an event at a stadium has been found to perfectly suit testing these networks out on a smaller scale. Many of the activities that take place in such a location tend to mimic the main parameters of interest of a city at a smaller scale, from analysing crowd behaviour and numbers for safety purposes to monitoring the environment/infrastructure for possible safety and sustainability issues.

 

5. The Amount of “Fake News” Looks Set to Rise

pasted image 0 (4).png

The development of ”Face2Face: Real-time Face Capture and Reenactment of RGB Videos” now allows footage of “target” speakers (e.g. Donald Trump) to be convincingly altered in real-time based on the actions of an actor, making it appear as if the target speaker is now saying the words of the actor.

Creating such realistic results is done by taking into account factors such as the pose, illumination and expression of both the actor and target speaker and transferring the appropriate features of the actor onto the target speaker. This is thought to have many beneficial applications such as in virtual-reality, facial-capturing for gaming, and foreign language video-dubbing. Let’s just hope we choose to avoid using it alongside speech synthesis technology for the nefarious purpose of quite literally putting words in other people’s mouths, causing a potential upsurge in “fake news”!  

Conclusion

Of course there were many more insights I gained over the course of the day, such as that bigger does not necessarily mean better (in terms of network size), or “the more” does not necessarily mean “the merrier” (in terms of training data), however I found there to be too many to describe in detail in a single blog post, so you’ll have to settle with these 5.