I recently launched wishlist palace which is a web app to help share information with friends and family about birthday lists and christmas lists. This app is my first exploration into using Augemented coding (which is a term I prefer over vibe coding).
When I first heard of vibe coding, I wasn’t particularly interested in engaging with it as a mode of production I use myself. I enjoy the problem solving aspect of software engineering and at face value much of that process falls into the lap of the LLM when doing vibe coding. And it’s not only the solution of the problem at hand that is of interest to me but directly engaging and phrasing that problem in the formalisms of programming languages scartches an itch in the same way I image poets feel when they want to express themselves in poetry. The trouble with the situation we (as tech folk) are in now is that the terms (like vibe and augmented coding) are highly overloaded and are used to described vastly different development cycles from the one shot prompt of 100 words attempting to get a polished app to the very detailed prompt to obtain the 10 line fix you requested in a few hundred words.
Accompanying these observations are claims that LLMs are unlocking (insert large integer) times speed up in product development and claims that the landscape of software development has fundamentally shifted and there will be no jobs and we are on the cusp of fully automating the production of software for all use cases. Then there are people I respect deeply saying they will never touch LLMs and will categorically reject any code which smells as if it was touched by generative AI. Not to mention important ethical considerations that have been raised about AI use. All this conflicting information has left me with a bit of confusion.
Rather than try to parse what is globally true about LLMs and their use to generate software, I thought I’d see how I could use to solve a problem I have (and a somewhat mundane problem at that). Then with this as at least one data point I can start to see where I land in the vast spectrum of AI/LLM assessment.
Choosing frameworks
The choice of tech stack will have a big impact on what you can expect from the LLM. For wishlist palace I went with technologies I have some knowledge of and also that I like the philosophy of
- Elixir
mixbuild automation tool and functional programming Ash Frameworkfor domain modeling and also for it’sash_authenticationlibraryPheonixfor the web server and to use it’s LiveView for realtime updates to wish lists
In theory an LLM could generate for you a bespoke programming language for your project. But just as humans
often choose to leverage what has already worked because they can benefit from what other’s have developed
so too there is obvious benefit of choosing what has been proven to work when doing augmented coding.
Since the popularity of a framework or language is a proxy for how much training data the LLM had access
to, I think it’s also important to make a choice based on systems you like to work with. For me that’s functional
programming. I have learned some elixir doing some side projects but I’ve never released a web app in the
language. I chose elixir because I am intereseted in the BEAM virtual machine
and it’s cool features like hot code loading and it’s concurrency features. So I was a bit familiar with
Elixir and have used the webframework Phoenix a bit before (but never in production) so they were
reasonable candidates for the prospect of evaluating the generated output from the LLM.
There is a big difference between generating code in a language or framework that you have close to
zero context yourself in and generating it in a context in which you have deep knowledge. In my professional
work I use python primarily and work on backend systems. I feel much more confident in that scenario when
evaluating LLM generated plans and code given that I just have much more experience building with.
I have much less experience building front end features. I’m still a bit scared to modify css files and JS is not
a strong suite of mine. So I knew I was going to have a harder time evaluating the quality of the front
end components generated by an LLM. But Pheonix has HTML HEEx (HTML + Embedded Elixir) which is not such
a stretch for me, I do know some HTML (certainly I don’t know all of the features of HTML though).
Start with core data structures
A strategy I think is helpful is to first build the core data structures and transformations on these data strucutres that you app will require. This way you can reason about if you have the right data strucutures before jumping into designing the user experience.
This strategy fits well with the philosophy of Ash Framework which is well suited to
domain driven design. I like Ash because it gives you an easy way to declare what the shape of the data
looks like (as attributes), the transformations that can be applied to them (as actions), and how it
will be persisted (with Ash’s Repo abstraction).
So I built first the concepts of List and Item which are funnily enough very similar to the classic
TODO list app that is usually the subject of alot of language and framework tutorials!
For the key actions that need to be applied to lists there are the classic CRUD ones, but also I wanted to add the notion of having a user “claim” an item. This is important for birthday and christmas lists so that who gift givers don’t get the same gift. This was the first action and attributes for these items which depart from bog standard CRUD operations on data.
Prompt for small features
The ideal time to introduce an LLM tool to develop features is after you have the core data structures and how to manipulate at least somewhat complete. The data structures and the actions you want to take on them may evolve overtime, but it’s difficult to describe both a desired feature and a proper data model in a single prompt so that’s why I like to do it in this order.
Then in my experience it is best not to try to prompt to jump straight to the finished product you have in mind. Try to break up the jump to your vision of the web app into smaller chunks. This is of course no surprise to people who have been developing software for a while and is a best practice not because it is written in a book somewhere but because it makes each change easier to reason about for both humans and for machines.
But just because you want to break up the changes into discrete steps doesn’t mean they have to be tiny in terms of line count or even in terms of impact. Just conceptually smallish! For my case one of the first additional features I wanted to develop after the data foundation had been laid was to add users and the pheonix views to be able to create lists and add items.
On the one hand this is a big change from what I already had, I didn’t set up Phoenix yet, I didn’t have users abstraction, and I didn’t have any of the front end templates setup. But the quality of the current models is such that you can prompt in a way that leverages the frameworks you have chosen. Ash framework and Phoenix are known to work well together and Phoenix is very much ready to support the notion of users and users wanting to do CRUD actions on data. So even though there were quite of few lines that need to be written conceptually this was not a large jump and the model handled it well.
Test as you go
After you have a chunk of functionality up using classic testing practices is a good way to verify things
are going well. I used a mix of unit testing and “integration” tests against a locally running db to make
sure the functional behavior was correct. In addition Phoenix has excellent tooling for running the
app locally so it’s easy to very user workflows from the locally running app in your own browser.
I don’t think there is too much difference in how to test for hand written versus augmented coding techniques. The main question is how much you rely on the model to write the tests. I find that it’s best to really interrogate the model about the usefulness of the test as they tend to write alot of tests that don’t really test any meaningful behavior of the system. Fewer but more impactful tests is better than 100 tests that don’t really catch regressions or are annoying to update. Again this a classic insight from software engineering that is probably widely known.
Learning by doing?
One of the goals I had this with project was to teach myself more about the front end bits of pheonix. The classic approach to learning a new technology is to build something with it. This naturally takes you on a journey to discover how to use the thing. But with LLMs, I notice that the learning by doing is much less impactful than doing something by hand. I think this is a large trade-off. Of course I ended with a functional program much faster with LLM usage, but I don’t feel deeply knowledgable about HTML HEEx yet. I know more than I did to start with but I feel not as much as if I had done it “the old fashioned way.” So learning outcomes of building this project were not as good as other methods.