List Processing (LISP) was the first programming language for Artificial Intelligence (AI) and at its inception in 1958, it set the stage for a future where machines could assist in writing and optimising code. Today, AI code generators represent a significant milestone in software development, offering unprecedented speed and efficiency but also introducing challenges including new cyber security risks. In this article, we explore some of the factors that developers and security professionals should consider as they seek to harness the benefits of AI code generators, whilst mitigating the associated risks.
“Our ultimate objective is to make programmes that learn from their experience as effectively as humans do”
AI code generators use algorithms to pre-emptively suggest code. These tools are often referred to as AI pair programmers or AI code assistants and enhance productivity by generating or suggesting code. They are attractive to developers because of the material productivity increases that they offer. They can create new functions, code snippets or entire programmes and learn from vast open-source code repositories which include familiar names such as GitHub’s Copilot, AI Codeium, and Amazon’s Q Developer. Like other Large Language Models such as ChatGPT, Claude or Gemini, they use algorithms to pre-emptively suggest the user’s next most likely requirement and to generate code. Many factors can influence the quality of this code including the specificity of the prompt, a direct user instruction, the programming languages available, or the volume of training data consumed.
The primary advantage of these tools is increased productivity but there are other benefits such as reductions in cognitive load and the elimination of repetitive tasks. AI code generators can enhance the user experience and allow developers to complete code more quickly and efficiently, freeing time to focus on complex problems, however, there is a risk that developers can place too much trust in the code generated.
Whilst AI code generators can enhance the Secure Development Life-Cycle they can also introduce security vulnerabilities if not properly managed. The principle of “Garbage In, Garbage Out” highlights the necessity for high-quality training datasets. Poorly written or poor-quality training data can result in the AI generating similarly flawed code. Consequently, the training datasets and code repositories that AI code generators use must be of high quality. Further risks arise from the lack of contextual understanding, the possibility of data privacy breaches and even the potential for bias and discrimination.
As an illustration of these weaknesses, Stanford University undertook a study which demonstrated this risk via a test involving 47 participants. Half of the participants used AI pair programmers whilst the remainder relied solely on their knowledge of the three programming languages; Python, Javascript and C. Undertaking a series of five coding tasks, the group using only their expertise, produced more secure code with fewer vulnerabilities on four out of the five tasks. Those using AI assistance not only introduced more vulnerabilities but also mistakenly believed that they were writing more secure code. This outcome underscores AI’s dependence on its training data and the critical role that human oversight has to play – the cognitive mind has the ability to assess and make suitable amendments from existing knowledge, however the AI pair programmer will look for commonality and popular patterns from the past.
Even though AI code generators can learn to identify and avoid vulnerabilities, the ultimate responsibility lies with human developers to validate and ensure code outputs and security. Existing security issues in a codebase can be learned by AI through training of the Large Language Model and this in turn can potentially reduce the initial productivity gains as a result of the need for additional code reviews, issue rectification and retesting. However, with the proper safeguards and continuous improvement of training datasets, AI code generators can materially enhance productivity whilst contributing to a secure development process.
As the adoption of AI code generators becomes more commonplace, how should we reap the benefits whilst maintaining security? The recommended approach mirrors other applications of security processes. Businesses must:
Development teams should conduct a baseline risk assessment to identify and understand the risk level of their environments before onboarding AI code generators. Organisations should focus on factors such as data privacy and code quality. Ongoing code reviews are crucial; as confidence in the AI’s output grows through the review of consistently high quality and secure code, the frequency of reviews can be reduced.
AI-driven code generators are revolutionising development by offering time-saving solutions to complex coding challenges. AI systems are only as effective as their training data and instructions, necessitating integrity checks performed by people to assure quality. Security professionals must rigorously review and test AI-generated code to safeguard against vulnerabilities.
By combining the possibilities that AI brings with and the strengths and experience of cyber security experts, we can create robust, secure and resilient systems faster than ever before. This synergy enhances productivity and innovation, whilst reducing time to deployment and protecting digital systems against escalating threats.
At 6point6, we support clients with AI adoption by:
Contact us to learn more about the benefits of AI for your organisation as you head towards a more secure and innovative future.
Aditi Ramachandran manages the delivery of security assurance services, providing governance, risk, compliance, and information assurance to clients in the public sector. Harry Clark has 8 years’ experience managing cyber security risk and governance for complex technical and cyber initiatives.