Rage Against the Algorithm: the Risks of Overestimating Military Artificial Intelligence

Increasing dependency on artificial intelligence (AI) for military technologies is inevitable and efforts to develop these technologies to use in the battlefield is proceeding apace, however, developers and end-users must ensure the reliability of these technologies, writes Yasmin Afina.

Expert comment Updated 4 September 2020 Published 27 August 2020 3 minute READ

Yasmin Afina

Former Research Fellow, Digital Society Initiative

F-16 SimuSphere HD flight simulator at Link Simulation in Arlington, Texas, US. Photo: Getty Images.

F-16 SimuSphere HD flight simulator at Link Simulation in Arlington, Texas, US. Photo: Getty Images.

AI holds the potential to replace humans for tactical tasks in military operations beyond current applications such as navigation assistance. For example, in the US, the Defense Advanced Research Projects Agency (DARPA) recently held the final round of its AlphaDogfight Trials where an algorithm controlling a simulated F-16 fighter was pitted against an Air Force pilot in virtual aerial combat. The algorithm won by 5-0. So what does this mean for the future of military operations?

The agency’s deputy director remarked that these tools are now ‘ready for weapons systems designers to be in the toolbox’. At first glance, the dogfight shows that an AI-enabled air combat would provide tremendous military advantage including the lack of survival instincts inherent to humans, the ability to consistently operate with high acceleration stress beyond the limitations of the human body and high targeting precision.

The outcome of these trials, however, does not mean that this technology is ready for deployment in the battlefield. In fact, an array of considerations must be taken into account prior to their deployment and use – namely the ability to adapt in real-life combat situations, physical limitations and legal compliance.

Testing environment versus real-life applications

First, as with all technologies, the performance of an algorithm in its testing environment is bound to differ from real-life applications such as in the case of cluster munitions. For instance, Google Health developed an algorithm to help with diabetic retinopathy screening. While the algorithm’s accuracy rate in the lab was over 90 per cent, it did not perform well out of the lab because the algorithm was used to high-quality scans in its training, it rejected more than a fifth of the real-life scans which were deemed as being below the quality threshold required. As a result, the process ended up being as time-consuming and costly – if not more so – than traditional screening.

Similarly, virtual environments akin to the AlphaDogfight Trials do not reflect the extent of risks, hazards and unpredictability of real-life combat. In the dogfight exercise, for example, the algorithm had full situational awareness and was repeatedly trained to the rules, parameters and limitations of its operating environment. But, in a real-life dynamic and battlefield, the list of variables is long and will inevitably fluctuate: visibility may be poor, extreme weather could affect operations and the performance of aircraft and the behaviour and actions of adversaries will be unpredictable.

Every single eventuality would need to be programmed in line with the commander’s intent in an ever-changing situation or it would drastically affect the performance of algorithms including in target identification and firing precision.

Hardware limitations

Another consideration relates to the limitations of the hardware that AI systems depend on. Algorithms depend on hardware to operate equipment such as sensors and computer systems – each of which are constrained by physical limitations. These can be targeted by an adversary, for example, through electronic interference to disrupt the functioning of the computer systems which the algorithms are operating from.

Hardware may also be affected involuntarily. For instance, a ‘pilotless’ aircraft controlled by an algorithm can indeed undergo higher accelerations, and thus, higher g-force than the human body can endure. However, the aircraft in itself is also subject to physical limitations such as acceleration limits beyond which parts of the aircraft, such as its sensors, may be severely damaged which in turn affects the algorithm’s performance and, ultimately, mission success. It is critical that these physical limitations are factored into the equation when deploying these machines especially when they so heavily rely on sensors.

Legal compliance

Another major, and perhaps the greatest, consideration relates to the ability to rely on machines for legal compliance. The DARPA dogfight exclusively focused on the algorithm’s ability to successfully control the aircraft and counter the adversary, however, nothing indicates its ability to ensure that strikes remain within the boundaries of the law.

In an armed conflict, the deployment and use of such systems in the battlefield are not exempt from international humanitarian law (IHL) and most notably its customary principles of distinction, proportionality and precautions in attack. It would need to be able to differentiate between civilians, combatants and military objectives, calculate whether its attacks will be proportionate against the set military objective and live collateral damage estimates and take the necessary precautions to ensure the attacks remain within the boundaries of the law – including the ability to abort if necessary. This would also require the machine to have the ability to stay within the rules of engagement for that particular operation.

It is therefore critical to incorporate IHL considerations from the conception and throughout the development and testing phases of algorithms to ensure the machines are sufficiently reliable for legal compliance purposes.

It is also important that developers address the ‘black box’ issue whereby the algorithm’s calculations are so complex that it is impossible for humans to understand how it came to its results. It is not only necessary to address the algorithm’s opacity to improve the algorithm’s performance over time, it is also key for accountability and investigation purposes in cases of incidents and suspected violations of applicable laws.

Reliability, testing and experimentation

Algorithms are becoming increasingly powerful and there is no doubt that they will confer tremendous advantages to the military. Over-hype, however, must be avoided at the expense of the machine’s reliability on the technical front as well as for legal compliance purposes.

The testing and experimentation phases are key during which developers will have the ability to fine-tune the algorithms. Developers must, therefore, be held accountable for ensuring the reliability of machines by incorporating considerations pertaining to performance and accuracy, hardware limitations as well as legal compliance. This could help prevent incidents in real life that result from overestimating of the capabilities of AI in military operations.