We have been busy cooking up some AI powered features these last few months and are excited to launch them this week!
This blog post aims to highlight the differences weāve felt when running projects for AI heavy features relative to our usual ways of working.
Going into these projects, we were often unsure whether LLMs like OpenAIās GPTs were capable of what we wanted it to do. For example, would a feature like suggested follow-ups be able to spot what constitutes a follow-up? Would it be overeager to suggest one? Does it produce good results consistently?
By starting projects like these with an experimental approach, we focus on proving that we are able to produce great results consistently for customers to gain value. This also gives us clear points where we can ask ourselves whether we should continue, rethink the approach, or "park it" and focus on something with a better ROI.
We will cover a few ways to implement this approach in the next section. For a concrete example, head over to see lessons learnt when building suggested summaries.
Starting with a proof of concept, we want to get as quickly to a world where we understand what GPT is capable of and how it could be incorporated with the bigger picture of the feature (i.e. how users will interact with it).
With speed being crucial, quick iteration cycles will quickly pay for themselves. For example, we built a command to test prompt changes quickly using a historical dataset and see if that improves the output.
Although we may have an idea of what the feature would look like at this stage, we often intentionally do not build a UI until later. We want to make sure we fully focus on understanding what is possible with LLMs.
Once we are confident that the approach will yield acceptable results, we want to build an early MVP that we can start using for ourselves in production.
This version is intentionally unpolished, but will be something that we can extrapolate to understand what the final version will feel like. An example of this is when we were building an AI Assistant to help understand historical incident data. This helped us answer some questions, as well as uncover some unknown unknowns:
By getting it into production for the rest of the company (a.k.a. dogfooding), we make an incremental step of seeing what people expect from the feature. We also gain more valuable feedback, which will help guide you towards a better solution before you spend a lot of time building everything else.
The key part of iterating on AI-heavy features that are powered by an LLM is refining the system and user prompts that are passed into it. We have a separate technical guide here on prompt engineering but wanted to touch on an easy mistake to make that would cost you a lot of time: not knowing when to stop refining the prompts.
Donāt let an imperfect prompt stop you from releasing to a wider (but still small) audience because this will stop you from getting a lot of valuable feedback early. Feedback will also most likely lead to changes in your prompt, so spending lots of time to perfect it may not be a worthwhile endeavour.
Hereās a few indicatorsĀ where it's probably time to stop refining the prompt:
If your prompt only yields good results in certain scenarios, consider skipping your feature when you know itāll perform poorly. This might be easier than refining the prompt to handle all scenarios. An example of this was our feature that surfaces related incidents during a live incident, and we explicitly excluded incidents with minimal information (such as declined incidents) to avoid showing false positives.
Once it's in the wild, it is important to understand how your feature is working. Establishing a feedback loop for an AI-heavy feature is an easy and high signal way of knowing how customers are using the tool and how you could improve it. Hereās some of the ways we have built this feedback loop.
Leave some time to let feedback come through, which you can distill into actionable changes that you can make to improve the feature. Rinse and repeat and watch your graphs go up (or down)!
Overall, there are few key differences to note when running projects that involve AI heavy features.
Moving fast does not happen by accident. Here is some of the intentional things our engineers do to move so quickly!
Cutting through the hype and dollar signs, why should you actually join incident.io? And also, why might this not work for you
In the past year, we've reimagined how we build AI products at incident.io, moving from simple prompt based features to now building full-blown AI-native systems end to end. Learn why weāre hiring AI Engineers, what that work looks like, and how itās changing the future of incident response.