Welcome to Disaster.Stream Bringing hard-won Lessons-Learned from Disaster Recovery Responders
Dec. 20, 2022

US Digital War Biometric Watchlist Fails Mid-war Lessons-learned

US Digital War Biometric Watchlist Fails Mid-war Lessons-learned

In 2006 the US military's biometric enabled watchlist used to find insurgents among the general population failed. I was asked to bring my team to diagnose and solve the problem. 

Watch Video Podcast on Cloudflare Streaming

The Army G2 and CIO G6 asked us for a communications “10 day AOR snapshot” analysis.  Findings provided definitive evidence of application, operational and network packet loss problems.  Efforts included iterative sessions with Fort Huachuca LTO (Language Technology Office) application developers and lab tests on Distributed Server Synchronization DSS performance.  This and other initiatives provided the priority of optimization opportunities resulting in the following quick win actions:

  • DSS software development improvements
  • Server Hardware Refresh & Server Farms to Overcome Processing Bottlenecks
  • Leadership Discussion of Network Problems and Application Performance Monitoring

Thereafter in 2007, PM Biometrics teamed with CENTCOM, creating the E2E / CENTCOM TNMA Instrumentation program which is the object of this report. 

The team was asked to review, use and or assemble new instrumentation monitoring capabilities to address application performance, operations and associated network issues affecting Biometrics.  Analysis with this particular “End to End” approach began in April 2007 with a tour of the AOR and CONUS locations.  The AOR tour included Bahrain, Qatar, Iraq, Afghanistan, Kuwait and Djibouti.   The CONUS locations included Charlottesville, VA, BFC and the FBI both in Clarksburg, WV.  Joint Chiefs, OSD and PM Biometrics directed the teaming arrangement with CENTCOM J6 operations leadership, the objective of the tour was to evaluate existing monitoring solutions which might provide visibility to biometrics application traffic.  The resulting joint conclusion was to build out new instrumentation to meet the immediate needs of this E2E Study and the subsequent long term needs of CENTCOM and the COCOM’s rolling E2E into the new requirements of TNMA (Theatre Network Management Architecture).  

As part of the Area of Responsibility AOR review in April 2007 the team met with the new Biometrics leadership in Iraq, who asked the team to immediately project “when the system would break” and what immediate mitigation steps could be performed in priority to mitigate the system breaking.  This is the list:

  • Improve DSS Replication Schema by reducing peers and using Server Farms
  • Roll out the new improvements in DSS software
  • Perform and provide occasional AOR  database and attachment metrics to identify incongruent counts and bottlenecks by location
  • Provide rigid documentation and configuration management control over the DSS Replication Schema and keep the teams at CENTCOM updated as changes occur.
  • Provide DSS Logs to this team to quantify improvements made by DSS versions so ongoing bottleneck identification could be performed.

Items 1.) were completed and provided improved replication.  Item 2.) DSS 2.7 was completed in July and provided data volume and replication speed improvements  recommendations that are realized.  

Joint CENTCOM and PM Biometrics systems of E2E and TNMA were designed and fully built with implementation in the AOR and CONUS locations.

CENTCOM has begun the E2E Study to TNMA operations transition for network volume and routing related metrics while Biometrics analysis continues as an E2E Study objective.  

Objective

Our job has been to gain an understanding of biometric application end to end performance, traffic volume and identify opportunities for improvement.  The biometrics family is a globally distributed set of network-intrinsic applications.  Successful network-intrinsic transactions are dependent upon simultaneous performance of biometric components and the intervening network to deliver robust and consistent performance.  This network-intrinsic nature allows performance monitoring of biometric transactions by way of strategically placed network instrumentation.  Biometric components communicate between one another, database servers, email servers, collection peripherals across the military networks in the AOR and CONUS locations.

Repeated one shot analysis such as performed in 2006 wouldn’t prioritize and solve the continual problems in growing biometrics network-intrinsic applications. Performance issues with biometric systems fall into two categories both requiring regular iterative analysis:

                Recurring

Effects of daily server moves, adds, changes, biometric replication changes, biometric traffic volume, performance degradation, dynamic network route flaps,  packet loss induced by errors, QoS and capacity bottlenecks.  All of these are examples of the problematic moving targets, changing every day. 

                Evolutionary

Application version update characteristics, mission changes,  and major network architecture and technology evolutions. These are examples of macro changes that can be trended to quantify improvements or warn of capacity limitations. Particularly important are historical metrics which afford extrapolation of trends to accommodate future missions commanders.   

The “snapshot” provided many diagnosis and mitigation recommendations, yet clearly recurring and evolutionary issues would require ongoing analysis.  Historical performance trends and data volume granularity to identify servers and applications critical to biometrics are essential to support the study and should continue to be maintained.  The End to End Biometrics Study has built out distributed instrumentation to follow critical  biometric paths from field enrollment to CONUS locations and match traffic back toward the warfighter-user.   Sophisticated data analysis and problem triangulation can now take place to isolate the failure domain from “somewhere” to exactly where in many cases.  

Executive Summary Bottom Line Up Front

  • The End to End Study monitoring architecture has been successfully deployed throughout the AOR and CONUS locations as planned. E2E initial operating capability before 2012.
  • Convergence of network and application monitoring disciplines begins.
  • Early significant findings provided to CENTCOM and PM Biometrics for mitigation analysis.
  • Leadership appeals:
    • Encourage rapid and ongoing finding follow-up accountability.
    • Encourage network and application system management collaboration.
    • Encourage proactive Biometric system and Network documentation
  • E2E Study continues analysis on new and existing findings.
  • Next Steps –
    • CENTCOM TNMA Support initiatives
    • Training initiatives
    • Biometrics Support initiatives
    • Biometrics Application AOR simulation lab environment
    • NG ABIS development iterative analysis
    • BAT Development oriented iterative performance analysis
    • Embedded metrics and reporting initiatives
    • Biometric documentation leadership and change control initiatives

Executive Summary

This project’s leadership catalysts Joint Chiefs, OSD, Army CIO-G6 and G2 is testament to exemplary executive vision.  Their support provided for uncommon partnership collaboration between PEO-EIS PM Biometrics and CENTCOM as the executing agent.  CENTCOM leadership, personnel and contractors worked together with COCOMs and dedicated warfighters in every instrumented location in the AOR to realize new combined application and network management capabilities.   

Global Biometric application ownership is an onerous responsibility.  Transactions traverse dynamically changing links, between hundreds of locations, serving thousands of users.  Biometric applications are barely out of the lab, are rapidly growing and already serve an essential productive mission.  Performance limiting problems often have interrelated application and network factors.  When problems occur, typical application owners have no place to turn.  Few understand the many unique components, technology, architecture, extensive reach, operations, users or mission.  Biometric transactions traverse many remote network ownership domains whose definition of success is a “ping” (a simple connectivity test) providing no advocacy for an application’s performance or consistent reliability end to end.  The educational progress is long, iterative and never ending.

When performance is degraded, monitors at multiple locations provide visibility into which location, response time component or process is causing problems.  Monitors report to central consoles which time correlate performance across the entire global environment.  For application and conversation volume, existing routers - already part of the network were configured to be the source of data. 

Five key concentrations of biometric servers received monitors to gather application transaction performance information.  Monitored statistics are measured locally with resulting data significantly reduced before moved across satellite links to performance correlating consoles at CENTCOM.  Analysts access the consoles from their web browser. 

Application related instrumentation provides for both application and many network management needs.  Thus, application and network metrics and disciplines are beginning to converge.  Combining their instrumentation reduces components, cost, training and the oft ineffectual productivity between them (deflective finger-pointing).  It is industry typical for application owners to grow weary of reporting network-intrinsic performance problems network maintainers don’t understand and are not required to support. 

The partnership between PM Biometrics and CENTCOM exemplifies the convergence of application and network management with this report’s early findings pregnant with opportunity for long term mutual success.    

Leadership Appeals

The new leadership appeal is to advocate responsible action to address the problems and account for the findings.  The initial danger is to allow maintainers to focus (arm wave) on what might be lower priority soft or inconclusive findings, ignoring or minimizing the many prioritized definitive hard findings. 

This process will require learning new metrics, responsibilities and relationships.  It will require following each error symptom until it is fully understood, mitigated or answered.  There are big problems to address and “chip away at” which will require tenacity on the part of all organizations.  The status quo response of “that’s not my problem” or “it’s a bad application and shouldn’t be used” or “ this a war network - of course there are problems” should be replaced with “let’s fix everything we can” and “let’s quantify the results” or  “we’ll learn valuable new skills while we improve user performance”. 

This should be viewed as an opportunity for technologist and leadership to work together to provide warfighter end users with a new level of application performance excellence and advocacy.  Fixing these identified and prioritized problems will eliminate many real and “tyranny of the urgent” issues occupying valuable technologist, staff reporting and leadership agenda time.

Continued attention should be placed toward joint ownership and collaboration.  Access to result oriented tools should eventually provide consolidation across system management platforms.  There seems to be a symphony of “my network”, “my network management system” and “my QoS methods(selective performance limiting parameters)” rather than “our”.  This paradigm can continue to changed through CENTCOM’s already effective wooing of COCOM’s to common goals. 

The brightest technology leadership will be compelled to participate and support forums where the best ideas are welcome.  DoD has long been aware of the benefit and result of collaborative design methodologies from its spawning of the Internet through academic research channels. 

Internet RFC’s are very carefully worded collaborative “Requests For Comments” avoiding the use of “mandates” but attain the most adhered to unspoken, peer to peer mandates in technology history.  The first RFC’s were not about technology, but about how the Internet Society would collaborate to adopt the “best information at the time” the decision is made and sticking with such for sufficient time to provide an architectural return on their investment and a new solution could build on that success.  Internet psychology has proven effective and is likely the most ubiquitously admired of modern DoD initiatives.  

During the review of the AOR general network architecture and associated network management documentation including diagrams and link provisioning details it became obvious that some areas had meticulous micro details without an effective macro view and vice versa.  CENTCOM recently built an excellent set of large format consolidated network views, but lacks display and training space for such large formats to be displayed and used for technologist education. 

They also suffer from not being dynamically linked to speed provisioning updates occurring regularly.  The larger the network or system with more people making changes and responsible for troubleshooting the more the documentation needs to be part of regular proactive operations. 

The documentation illustrates the system architecture.  If technologists are not able to understand the architecture they are bound to destroy it through ignorance.  Such diagrams and details are the key to creating and maintaining technical cultural architecture awareness and used to teach and model best practice behavior in and between COCOMs.  

Biometric architecture documentation, component lists and diagrams are emerging, but remain rudimentary with many incongruities and without regular update schedules.  With the proliferation of biometric FSE’s (Field Support Engineers) in the AOR and many support personnel ready to assist the Biometrics Community at CENTCOM and COCOM’s a critical danger exists that assistance with problems will be inhibited by the lack of up to date information on servers, roles, DSS replication, and biometric enrollment forwarding architecture.   This is likely to grow worse as more technologists institute unrecorded and undocumented changes. 

Once change control approves a change without associated up to date diagrams and lists it is too late as technologists are not typically known for volunteering to update the diagram after the technical change is made. 

The best way to institute effective updated documentation is by requiring it of technologists as they request a desired change.  In this manner documentation will always be up to date as changes occur. 

Most network and application problems can be traced changes - and ultimately to poor documentation and change control discipline.  Leadership can play a significant role in honoring and recognizing the value of these best practices.