Looking at how Code Reviews can evolve

For decades we’ve practiced peer reviewing code from fellow team mates. Some fundamentals stayed the same even though tooling progressed at a steady pace. For some teams that didn’t have tooling, I remember them maintaining a pact of “the developer who breaks the build buys coffee for the team” sort of thing. In a way that also ensures that developers did their due diligence.

With AI development it’s been really interesting to see the difference in opinion when it comes to code reviews or putting in guard rails to ensure quality standards are met. This is a grey area at the moment and it will take time for the industry to formalise some patterns of how the code review process would look and how it would function. Human-in-the-Loop they say. Although, at which point does the human need to be involved in? Do humans need to see the actual code, syntax used, etc? Or do humans need to see meaningful metrics to gauge quality in code changes? Thinking about data being code, which is essentially raw, unprocessed, lacking context, however, metrics can be seen as information. Metrics would be information on which decisions can be made.

Some elements of meaningful metrics of code changes would be:

Percentage of code coverage of the changes made.
Number of test failures, unit tests, integration tests & functional tests
Merge Conflicts
Static analysis findings
Security Analysis findings
A visual representation of which components or modules in the system are affected by these changes.

Looking at the above, a developer should have enough context on code changes made and make a decision to commit these changes or not. Currently most pull request tools can be setup to only allow merging if a certain number of constraints are met. If there’s a safe way to automate this process of review, it should be done, but done deterministically, meaning if those changes were committed, then reverted and finally reintroduced with the exact same changes, the results of the review process must be 100% the same. It is the sense of determinism that provides confidence in software delivery.

If developers aren’t willing to abstract away from the inner workings, developers aren’t going to realise the full value of using AI for building features. The blending of roles of engineering and product development will slowly find appropriate ratios for developers as they would for product owners/managers. It will be the tool and framework developers that will need to dig deep into code and syntax, no longer the enterprise software engineer. Having worked on Rapid Application Development platforms like Unify (4GL) and custom in house built platforms that used its own Domain Specific Language (DSL), your role is more of a “Solutions Engineer/Developer” rather than a “Software Engineer/Developer”.

In a full SDLC, there’s QA that occurs after merging in a new feature and then User Acceptance Testing (UAT). These additional steps are all still guard rails that determine whether a release can be made. If we break away from the idea that a commit or merging of changes means a possible production deployment, we can reframe what the verification process looks like for a system. I honestly believe that with AI development that either scrum and agile won’t look the same or new SDLCs would appear. The idea that the team needs to deploy the system at the end of each sprint is really tied to the sprint goal, the team’s velocity and the commitment to that velocity. If you find that it’s not the case for your team and project, it might be that scrum/agile isn’t the right fit for what you’re needing to accomplish.

The concept of critical thinking isn’t only at the AI engineering level, I don’t believe so anyway. Critical thinking should be applied at planning your release and deployment strategies that align with your business and/or customer’s needs too. We need to remember that agile and scrum are management techniques for software delivery, not standard practices of modern software engineering. To test this thinking, let’s discuss a quick scenario. If you’re building payroll software and your sprint ends in the week of salary payments, do you deploy a new version of the system because agile/scrum indicates so or would you align with business needs, understand the risks associated with deployments in this period and choose to delay deployment once salary payments are concluded? Just some food for thought.

An experiment I would love to conduct is to automatically merge in pull requests, have the Human-in-the-Loop at the point of QA and UAT, and finally, only release a version of the software when both QA and UAT pass. Releases and deployments will be irregular, but you will have a strong sense of confidence in each release to the end user.