In the surveillance world there are certain grand challenges – holy grails that researchers and those who use surveillance pursue doggedly, spurned on by the technical issues such challenges pose.
Paramount in these is real-time face-in-the-crowd technology: a recognition system advanced enough to sift through large crowds of people, none of whom are consciously facing CCTV cameras, to get results.
Not for nothing is this type of face recognition referred to as the “killer application” in biometrics.
Bang-smack at the centre of this is something known as computer-assisted video analytics, which was born and came to prominence in the last decade.
Analytics use computers-based video analysis to monitor security cameras and assist their human operators to detect incidents in real-time.
This has become possible thanks to advances in the processing power of computers coupled with the availability of better statistical machine learning tools.
These machine-learning tools largely come from the research communities' work on computer vision and pattern recognition (CVPR) where surveillance is now a rapidly growing area, so much so that there are now conferences in this field.
Recent innovations in image and video processing technologies – such as IP cameras, low-bandwidth video codecs and computer graphics chips (GPUs), as well as smarter video analysis algorithms – have cleared the path for the effective use of computers for monitoring the video feeds of huge CCTV networks.
At the lower end, these capabilities include:
- perimeter protection, also known as “trip wire”, which works by visually detecting persons or vehicles entering restricted areas.
- left or removed object detection which detects left luggage and parcels in busy airports or theft of paintings in an art gallery.
- wrong-way detection, used to detect passengers bypassing security checks by entering the “sterile” area of an airport terminal via the exit doors from the baggage collection area.
Video analytics at the higher end of the market can perform sophisticated tasks such as:
- Automated Number Plate Recognition (ANPR), which is used on toll roads for revenue collection, and by law enforcement agencies to detect unregistered vehicles.
- crowd management, to detect excessive queues.
- face recognition, which matches faces to names using a large image database.
- people tracking.
Technologies such as ANPR or iris identification/verification from a distance can perform at very acceptable and usable levels of accuracy these days.
This has opened the way for mainstream applications such as the collection of road tolls by video licence plate matching and access control for securing buildings.
There are, however, countless unhappy users of video analytics technologies in areas such as face recognition. The reason in many cases is a lack of reliability in some current commercial systems.
This usually comes down to an excessive number of false alarms which make such systems impractical and annoying. As with the boy who cried wolf, if a system provides hundreds of false alarms every day, operators are unlikely to pay attention to genuine alarms: they simply switch off their alarm system.
Strike a pose
Some key problems for “face in the crowd” identification in public spaces (such as airports or train stations) are the issues of pose, illumination, and expression mismatches between the query (or “probe”) face and the reference faces.
What do we mean by this?
- Pose: a person under surveillance doesn’t know they’re being photographed and are not necessarily looking at the camera.
- Illumination: people may be lit differently due to shadows and time of day.
- Expression: the person under surveillance may be talking to a friend.
In addition to robustness and accuracy, the ability to process large face databases quickly is of premium importance for surveillance. A recognition system should be able to handle large numbers of people (e.g. peak hour at a railway station), possibly processing hundreds of video streams from multiple CCTV cameras simultaneously.
While it is possible to set up elaborate parallel computation machines, there are always practical cost considerations limiting the number of computers available for processing.
In this context, a system should be able to run in real-time (or better, in the case of recorded footage) which necessarily limits complexity. But the sheer usefulness of such a system – as well as the scale of the technical challenge of uncontrolled face recognition from CCTV video – is why face-in-the-crowd recognition is considered something of a holy grail.
There have been some new breakthroughs towards solving or at least partially addressing this grand challenge as researchers hone their skills in competitions such as the Multi-Biometrics Grand Challenge, sponsored by the US’s National Institute of Standards and Technology.
As you would imagine, breakthroughs in this area are viewed with great interest, not least in terms of the possible applications for national security, which we will discuss tomorrow.
This is the third in our five-part series on advanced surveillance. To read the other four instalments, follow the links below: