Recent Changes
Here is a list of recent updates the site. You can also get this
information as an RSS feed and I announce new
articles on Fediverse (Mastodon),
Bluesky,
LinkedIn, and
X (Twitter)
.
I use this page to list both new articles and additions to existing
articles. Since I often publish articles in installments, many entries on this
page will be new installments to recently published articles, such
announcements are indented and don't show up in the recent changes sections of
my home page.
Tue 18 Feb 2025 07:02 +07
Recent LLM models have provided “reasoning” capabilities. Birgitta
Böckeler asks what role these play with coding tasks. She
doesn't have an answer, but does have questions and thoughts - and has
not found such capabilities worthwhile so far.
more…
Thu 13 Feb 2025 21:16
LLMs struggle with large amounts of context. Bharani
Subramaniam and I explain how to mitigate this common RAG
problem with a Reranker which takes the document
fragments from the retriever, and ranks them according to their usefulness.
more…
Wed 12 Feb 2025 07:58
Users often have difficulty writing the most effective queries.
Bharani Subramaniam and I explain Query Rewriting:
getting an LLM to formulate alternative queries to send to a RAG's
retriever.
more…
Thu 06 Feb 2025 09:17
The appearance of DeepSeek Large-Language Models has caused a lot of
discussion and angst since their latest versions appeared at the beginning
of 2025. But much of the value of DeepSeek's work comes from the papers
they have published over the last year. Shayan Mohanty
provides an overview of these papers, highlighting three main arcs in this
research: a focus on improving cost and memory efficiency, the use of HPC
Co-Design to train large models on limited hardware, and the development
of emergent reasoning from large-scale reinforcement learning
more…
Wed 05 Feb 2025 10:03
Today Bharani Subramaniam and I outline four
limitations to the simple RAG from yesterday, and the pattern that
addresses the first of these: Hybrid Retriever. This tackles the
inefficiencies of embeddings-based search by combining it with other
search techniques.
more…
Tue 04 Feb 2025 10:23
was on a panel at goto Copenhagen last September with Holly Cummings,
Trisha Gee, Dave Farley, and Daniel Terhorst-North. We discussed the
current state of software development and where it was heading. Given the
timing, there was much discussion about the role AI would play in our
profession's future.
more…
Tue 04 Feb 2025 09:57
A pre-trained GenAI model lacks recent and specific information about a
domain. Bharani Subramaniam and I explain how Retrieval
Augmented Generation (RAG) can fill that gap.
more…
Thu 30 Jan 2025 00:00 +07
The Forest and the Desert is a metaphor for thinking about software
development processes, developed by Beth Andres-Beck and hir father Kent Beck.
It posits that two communities of software developers have great difficulty
communicating to each other because they live in very different contexts, so
advice that applies to one sounds like nonsense to the other.
The desert is the common world of software development, where bugs are
plentiful, skill isn't cultivated, and communications with users is difficult.
The forest is the world of a well-run team that uses something like Extreme Programming, where developers swiftly put changes into
production, protected by their tests, code is invested in to keep it healthy,
and there is regular contact with The Customer.
Clearly Beth and Kent prefer The Forest (as do I). But the metaphor is more
about how description of The Forest and the advice for how to work there often
sounds nonsensical to those whose only experience is The Desert. It reminds us
that any lessons we draw about software development practice, or architectural
patterns, are governed by the context that we experienced them. It is possible
to change Desert into Forest, but it's difficult - often requiring people to do
things that are both hard and counter-intuitive. (It seems sadly easier for
The Forest to submit to desertification.)
In this framing I'm definitely a Forest Dweller, and seek with Thoughtworks
to cultivate a healthy forest for us and our clients. I work to explain The Forest to Desert
Dwellers, and help my fellow Forest Dwellers to make their forest even more
plentiful.
Acknowledgements
Kent Beck supplied the image, which he may have painstakingly drew pixel by
pixel. Or not.
Wed 29 Jan 2025 10:55
GenAI systems, like many modern AI approaches, have to handle vast
quantities of data, and find similarities between elements in an image or
chunk of words. Bharani Subramaniam and I describe a key tool
to do this - Embeddings - transforming large data blocks into
numeric vectors so that embeddings near each other represent related
concepts
more…
Tue 28 Jan 2025 07:43
Everyone is fascinated about using generative AI these days, and my
colleagues are no exception. Some of them have had the opportunity to put
these system into practice, both as proof-of-concept, and more importantly
as production system. I've known Bharani Subramaniam for
many years as a technology leader in India, he's been assembling the
lessons we've learned and I've worked with him to describe them as
patterns.
In this first installment, we look the limits of the base case of
Direct Prompting, and how we might assess the capability of a system using
Evals.
more…
Fri 24 Jan 2025 13:01
Luca Rossi hosts a podcast (and newsletter) called Refactoring, so it's
obvious that we have some interests in common. The tile comes from me
leaning heavily on Beth Anders-Beck and Kent Beck's metaphor of The Forest and The Desert. We talk
about the impact of AI on software development, the metaphor of technical
debt, and the current state of agile software development.
more…
Wed 22 Jan 2025 10:54
Juntao Qiu finishes his description of codemods by
looking at some other approaches outside the JavaScript world such as
JavaParser and OpenRewrite.
more…
Wed 15 Jan 2025 11:43
So far the codemods that Juntao Qiu has described are
fascinating, but rather straightforward. Real codebases offer more
challenges. In this installment, he goes into how to tackle more
complicated cases by composing codemods.
more…
Thu 09 Jan 2025 09:49
I've got into the habit of starting the New Year by sharing six
favorite albums I discovered during the last year. This years set includes
Celtic jazz, trip-hop neo-fado, jazz-fusion for the 2020's, playful
harmonies, and a vibrant collaboration between Indian classical musicians and a
string quartet. (I was also unable to limit myself to six.)
more…
Wed 08 Jan 2025 09:44
Juntao Qiu moves onto a more complex example of a
codemod, one that extracts a tooltip responsibility from a JSX component.
This illustrates how to manipulate the AST in several steps.
more…
Tue 07 Jan 2025 09:34
As a library developer, you may create a popular utility that hundreds
of thousands of developers rely on daily, such as lodash or React. Over
time, usage patterns might emerge that go beyond your initial design. When
this happens, you may need to extend an API by adding parameters or
modifying function signatures to fix edge cases. The challenge lies in
rolling out these breaking changes without disrupting your users’
workflows. Juntao Qiu begins an article to explain how we
can use codemods to tackle this. Codemods are a tool automating
large-scale code transformations, allowing developers to introduce
breaking API changes, and refactor legacy codebases.
more…
Thu 12 Dec 2024 10:36
Design tokens are fundamental design decisions represented as data.
Andreas Kutschmann explains how they work
and how to organize them to balance scalability, maintainability and
developer experience.
more…
Tue 10 Dec 2024 15:22
Once we've designed our initial data products, Kiran
Prakash finishes his article by leading us through the
next steps: identifying common patterns, improving the developer
experience, and handling governance.
more…
Wed 04 Dec 2024 10:36
Having got an initial data product, Kiran Prakash
leads us through the next steps: covering similar uses cases to
generalize the data product, determining which domains the products fit
into, and considering service level objectives.
more…
Tue 03 Dec 2024 09:04
Increasingly the industry is seeing the value of creating data
products as a core organizing principle for analytic data. Kiran
Prakash has helped many clients design their data products, and
shares what he's learned. In particular his methodical approach doesn't
begin by thinking about some data that might be handy to share, but
instead works from what consumers of a data product need.
more…
Tue 19 Nov 2024 10:17
A very powerful new coding assistance feature made its way into GitHub
Copilot at the end of October. This new “multi-file editing” capability
expands the scope of AI assistance from small, localized suggestions to
larger implementations across multiple files. Birgitta
Böckeler tries out this new capability and finds
out how useful its changes tend to be, and wonders about what feedback
loops are needed with them.
more…
Wed 13 Nov 2024 12:33
With the recent uptick in tech activity on Bluesky, I've decided that
I will start posting there in addition to
my current locations.
I've also put together my general thoughts on the state of social
media, and how I'm using it, now that it's two years since The Muskover.
more…
Tue 05 Nov 2024 08:55
Matthew Foster and John Mikel Amiel
Regida finish their account of how they incrementally modernized
a mobile application by looking at the results of their work. They
achieved a significant shortening of time to new value, and found that
changes in the new application could be prepared in about half the time
it took on the old codebase.
more…
Wed 30 Oct 2024 10:29
Matthew Foster and John Mikel Amiel
Regida dive into the details of incrementally modernizing a
legacy mobile application. They look at how to implant the strangler fig
into the existing app, setting up bi-directional communication between
the new app and the legacy, and ensuring effective regression testing on the
overall system.
more…
Tue 29 Oct 2024 10:34
My colleagues are often involved in modernizing legacy systems, and
our approach is to do this in an incremental fashion. Doing this with a
mobile application raises some specific challenges. Matthew
Foster and John Mikel Amiel Regida share their
experiences of a recent engagement to do this, shifting from a monolithic
legacy application to one using a modular micro-app architecture.
more…
Fri 04 Oct 2024 09:16
I was interviewed on the Book Overflow podcast about the Refactoring
book. We talked about the origins of the book, the relationship
between refactoring, testing, and extreme programming, how refactoring is
used in the wild, and the role of books and long-form prose today.
more…
Tue 24 Sep 2024 09:51
Alessio Ferri, Tom Coggrave, and
Shodhan Sheth complete their article on what they have
learned from using GenAI with legacy systems. They describe how GenAI's
ability to process unstructured information makes it much easier to build
a capability map of a legacy system, tying the capabilities of a system
to the relevant parts of the source code. GenAI is also useful for
identifying areas of dead code, and has promise for better translations
of a system between platforms and languages.
more…
Wed 18 Sep 2024 08:08
Alessio Ferri, Tom Coggrave, and
Shodhan Sheth use their combination of an AST-fueled
knowledge graph and LLMs to gain understanding of legacy systems. They
have found it aids them both in extracting the low-level details of the
code and can provide high-level explanations to support human-centered
approaches such as event storming.
more…
Tue 17 Sep 2024 09:07
Most of the talk about the impact of GenAI on software development is
about its ability to write (messy) code. But many of us think it's going
to be much more useful to help us understand existing messy code,
as part of a modernization effort. My colleagues Alessio
Ferri, Tom Coggrave, and Shodhan
Sheth have been considering how GenAI can do this,
including building an internal tool to help explore the possibilities.
The tool uses an LLM to enhance a knowledge graph based on the AST of the
code base. It also uses an LLM to help users query this knowledge graph.
more…
Thu 05 Sep 2024 09:37
Decentralized data management requires automation to scale governance
effectively. Fitness functions are a powerful automated governance
technique my colleagues have applied to data products within the context
of a Data Mesh. Since data products serve as the foundational building
blocks of a data strategy, ensuring robust governance around them
significantly increases the chances of success. Kiran
Prakash explains how to do this, starting with simple tests for
key architectural characteristics and moving on to leveraging metadata
and Large Language Models.
more…