Autopergamene

Back

Snapshot Through the Heart

Published a year ago
8mn to read

While snapshot testing has been around for a while in the form of visual snapshots (used in visual regression testing), it's clear that the introduction of textual snapshots in Jest a few years ago had a big impact on testing, not only in Javascript but in other languages as well. But looking back on what it brought me a few years later I feel rather failed by snapshots. And while most of the blame is on me, I feel there's interesting things I learned along the way about snapshots and how to use them properly. Things that seem evident now but that I had to fail at before it stuck.

Consider this a postmortem of my (failed) usage of snapshots. But let's back up a bit, what even are snapshots and how did they come to be so prevalent?

The rise of snapshots

Not every language is born equal when it comes to test tooling. On the one hand you have languages like PHP that have been dominated by one single testing framework for close to 20 years now – everybody's agreed PHPUnit is the tool to use and while there are alternatives, the expectation is that this is the de facto standard.

And then you have languages like Javascript, where testing tools not only thrived but multiplied at the pace of wet gremlins. If you've been testing for some time in this language you'll have seen frameworks come and go, and shift in popularity almost as much as build tools did. This is conflated by the fact that not only does Javascript have a vast offer of test runners, but also assertion libraries, plugins and so on. And most often you could mix and match those at will to create the perfect testing library for your needs. But as far as I can remember – and maybe I am just bad at my job – this has always brought on more pain that it helped soothe. Having the community fractured in such a way around dozens of tools meant that advances made in one camp would not necessarily cascade to the others, and that each individual cog in the machine had its own ecosystem around it, varying in size depending on the popularity of it at a given moment. Additionally, tying these individual pieces together was often not as easy as it sounds, with niche edge cases arising from using this or that library together which wasn't expected, or this library supporting a given feature but only when used with another... it was a troubled time.

But then something happened a few years ago: Jest was released. And while it is by no mean the test framework to end all test frameworks, it's undeniable that its arrival metamorphosed the Javascript testing scene in a way not dissimilar to what Prettier did to code formatters. Jest was fast, required little to no configuration, supported all the bells and whistles of modern testing out of the box, was backed by a big player, and was just an all around refresher at a time where most other alternatives required a lot of boilerplate or configuration to achieve the same thing. Jest in itself was considered innovative because it was so opinionated, and at a time where a lot of people felt lost when it came to testing, it seemed like a light at the end of the tunnel. As such, the Javascript community slowly crystallised over Jest, helped by its adoption in various very popular libraries like Create React App.

So when Jest released snapshot testing in 2016, they did so to an already very large user-base. And more importantly, they did so to a user-base that wasn't entirely composed of people knowledgeable about testing, but also of people who were very novice about it. And the way they presented snapshot testing at the time, it seemed like the holy grail we didn't know we had been waiting for. Suddenly, you could test your components in a way that didn't really require you to actually test it, you could just drop it in, call matchesSnapshot and call it a day. What an incredible advance, testing is now super easy right?

https://jestjs.io/img/content/interactiveSnapshotDone.png

The snapshot fallacy

Of course when Facebook released snapshot testing, and explained they were using it extensively, they probably didn't consider that most people (lacking the knowledge) would take it and run away with it, completely missing the point of what it was designed to be used for.

Snapshot tests, be they visual or text, were first designed as a quality insurance tool. They were a way to easily and quickly check that when working on a given feature, you wouldn't break things in obvious ways. They work by taking a snapshot of what a page/component looks like (or renders like) at a given moment, and then checks future runs against that snapshot to warn you if things still match or not.

But they were never really meant to replace tests, they were meant to complement them and catch the most obvious mistakes in a quick and clear manner. There can be so much that goes wrong past the initial render that thinking of snapshots as the only thing you need is of course incorrect. Snapshots will catch bugs, obviously, but they will also let so much pass that they can never truly be entirely relied upon. Now here comes the problem: a lot of people, me included, were fully aware of that fact going in. Yet when I look at how I've used snapshots over the years, I can't help but notice I made that very same mistake a lot of the time.

When it becomes so easy to test any component in one simple line, it becomes easy to technically "test" everything and it creates the false assumption that one snapshot test is better than no tests. That because you've ensured the component isn't going to break in a major way, every other kind of test suddenly loses part of its priority. It creates a false safety net with which you can say "I'll add more complete tests later, we have the snapshot tests in the meantime".

The reality of course, is much more stark than that, because not only are snapshot tests not the silver bullet people make it out to be, but also because it's very easy to misuse these tests themselves. When Jest unveiled snapshot testing they clearly said that it was something you had to pay attention to. Snapshots weren't a generate and forget kind of thing, the main goal was, when reviewing a pull request, to actively review the changed snapshots and see for yourself if what had changed was correct or not. But because snapshots are all or nothing, they also very, VERY often change in ways that are absolutely normal. Maybe you've changed a div to a section in your layout, suddenly your entire snapshot suite fails so you just mass-update them and call it a day.

Once that has happened a good hundred times, you slowly lose the habit of actually checking your snapshots. You see that they've changed, just update them, and job done. Which, of course, defeats their entire purpose. Suddenly your safety net has holes so big you're not sure it still even does anything, and inevitably, something much more nefarious than a div change passes through the mails of your test suite.

https://qanish.com/wp-content/uploads/2019/02/screenshot-difference-e1545051723765-1-656x300.png

Doubling Down

When I reached that point, I felt slightly betrayed by textual snapshots (even though they weren't at fault). The rationale reaction to that should have been "I need more actual tests instead of just snapshots". We already had a decent amount of behavioural tests (using React Testing Library), we were just not using nearly as much of them as we should have.

My reaction instead, was to double down on my own mistake. I naively thought "If textual snapshots failed me, maybe visual snapshots are the answer?". It made sense at the time: if the reason I didn't pay attention to text snapshots was because of invisible changes, then taking a screenshot of the actual page would mean snapshots would fail on actual visible problems. So I started looking into the subject, at the time we had an extensive Storybook of all our components so I decided to simply use the Puppeteer plugin to take a screenshot of every Storybook page and use that as visual snapshots.

But there's a reason textual snapshots took the spotlight off visual snapshots: the latter are much, much harder to pull off than I anticipated. The reality is, because you're comparing what pages or components look like once rendered, you're at the mercy of whichever rendering engine takes the snapshot, what OS it runs on, and so on. Which means when we started generating the screenshots locally on OSX, and comparing them to the ones the CI would generate (on Linux), they started failing. Because the way Linux renders fonts was very slightly different and that 1% difference had huge ramifications in the way the layout overflows, the way text wraps, and so on.

Because the solutions to that required an even bigger involvement time-wise and I was already too deep to back down, we decided to band-aid it by simply upping the failure threshold of snapshots. You can probably see where this is going: we had to up it so much to go past OSes rendering differences that in the end, the snapshots became once again ineffective at the task we had set them up for. Because small important changes would fly below the failure threshold, and big unimportant changes (changing the navigation bar for example) would fail the whole suite.

We also encountered some other difficulties setting this up, amongst other thing we had to make sure the screenshots were taken after asynchronous actions and loaders and such (because we were now in a real DOM instead of a shim of one).

Admitting defeat

Eventually the problem I mentioned (bugs passing through snapshots) happened and it hit me like a ton of bricks. I had no "Option C", no further plan, the only thing I was left with was a bag of my own bad decisions. I started talking a bit about the issue with colleagues but soon realized the more I talked, that the issue lied neither in textual nor visual snapshots, but in the way we had been writing tests.

If there is one takeaway from this article, let it be this one: snapshots are awesome. They may not always be easy to pull off, they may not solve all your problems, but they do solve a lot of problems and I now consider them an incredibly useful part of a healthy test suite. Not a required one, but a welcome one. But the emphasis is on healthy test suite, don't start by your snapshots, don't even necessarily add them all the time. But whenever you have to assert that large sets of data stay "put", whenever you want a visual gallery of all the pages in your application, whenever you think to yourself "I wish I could ensure this particular piece of logic/UI didn't change" then absolutely do use snapshots. They're here for that and they're pretty good at their job by now.

But always be certain that the weight of making sure your logic stays put doesn't lay on them. We have such better other tools for that (acceptance tests for example). And yes they do require more work, but ultimately they will save your ass in a million ways that snapshots could not.

All this to say: don't be like me, don't use a screwdriver to hammer in a nail.

Use a hammer.

© 2020 - Emma Fabre - About

Autopergamene

Snapshot Through the Heart

Back

Snapshot Through the Heart

Published a year ago
8mn to read

While snapshot testing has been around for a while in the form of visual snapshots (used in visual regression testing), it's clear that the introduction of textual snapshots in Jest a few years ago had a big impact on testing, not only in Javascript but in other languages as well. But looking back on what it brought me a few years later I feel rather failed by snapshots. And while most of the blame is on me, I feel there's interesting things I learned along the way about snapshots and how to use them properly. Things that seem evident now but that I had to fail at before it stuck.

Consider this a postmortem of my (failed) usage of snapshots. But let's back up a bit, what even are snapshots and how did they come to be so prevalent?

The rise of snapshots

Not every language is born equal when it comes to test tooling. On the one hand you have languages like PHP that have been dominated by one single testing framework for close to 20 years now – everybody's agreed PHPUnit is the tool to use and while there are alternatives, the expectation is that this is the de facto standard.

And then you have languages like Javascript, where testing tools not only thrived but multiplied at the pace of wet gremlins. If you've been testing for some time in this language you'll have seen frameworks come and go, and shift in popularity almost as much as build tools did. This is conflated by the fact that not only does Javascript have a vast offer of test runners, but also assertion libraries, plugins and so on. And most often you could mix and match those at will to create the perfect testing library for your needs. But as far as I can remember – and maybe I am just bad at my job – this has always brought on more pain that it helped soothe. Having the community fractured in such a way around dozens of tools meant that advances made in one camp would not necessarily cascade to the others, and that each individual cog in the machine had its own ecosystem around it, varying in size depending on the popularity of it at a given moment. Additionally, tying these individual pieces together was often not as easy as it sounds, with niche edge cases arising from using this or that library together which wasn't expected, or this library supporting a given feature but only when used with another... it was a troubled time.

But then something happened a few years ago: Jest was released. And while it is by no mean the test framework to end all test frameworks, it's undeniable that its arrival metamorphosed the Javascript testing scene in a way not dissimilar to what Prettier did to code formatters. Jest was fast, required little to no configuration, supported all the bells and whistles of modern testing out of the box, was backed by a big player, and was just an all around refresher at a time where most other alternatives required a lot of boilerplate or configuration to achieve the same thing. Jest in itself was considered innovative because it was so opinionated, and at a time where a lot of people felt lost when it came to testing, it seemed like a light at the end of the tunnel. As such, the Javascript community slowly crystallised over Jest, helped by its adoption in various very popular libraries like Create React App.

So when Jest released snapshot testing in 2016, they did so to an already very large user-base. And more importantly, they did so to a user-base that wasn't entirely composed of people knowledgeable about testing, but also of people who were very novice about it. And the way they presented snapshot testing at the time, it seemed like the holy grail we didn't know we had been waiting for. Suddenly, you could test your components in a way that didn't really require you to actually test it, you could just drop it in, call matchesSnapshot and call it a day. What an incredible advance, testing is now super easy right?

https://jestjs.io/img/content/interactiveSnapshotDone.png

The snapshot fallacy

Of course when Facebook released snapshot testing, and explained they were using it extensively, they probably didn't consider that most people (lacking the knowledge) would take it and run away with it, completely missing the point of what it was designed to be used for.

Snapshot tests, be they visual or text, were first designed as a quality insurance tool. They were a way to easily and quickly check that when working on a given feature, you wouldn't break things in obvious ways. They work by taking a snapshot of what a page/component looks like (or renders like) at a given moment, and then checks future runs against that snapshot to warn you if things still match or not.

But they were never really meant to replace tests, they were meant to complement them and catch the most obvious mistakes in a quick and clear manner. There can be so much that goes wrong past the initial render that thinking of snapshots as the only thing you need is of course incorrect. Snapshots will catch bugs, obviously, but they will also let so much pass that they can never truly be entirely relied upon. Now here comes the problem: a lot of people, me included, were fully aware of that fact going in. Yet when I look at how I've used snapshots over the years, I can't help but notice I made that very same mistake a lot of the time.

When it becomes so easy to test any component in one simple line, it becomes easy to technically "test" everything and it creates the false assumption that one snapshot test is better than no tests. That because you've ensured the component isn't going to break in a major way, every other kind of test suddenly loses part of its priority. It creates a false safety net with which you can say "I'll add more complete tests later, we have the snapshot tests in the meantime".

The reality of course, is much more stark than that, because not only are snapshot tests not the silver bullet people make it out to be, but also because it's very easy to misuse these tests themselves. When Jest unveiled snapshot testing they clearly said that it was something you had to pay attention to. Snapshots weren't a generate and forget kind of thing, the main goal was, when reviewing a pull request, to actively review the changed snapshots and see for yourself if what had changed was correct or not. But because snapshots are all or nothing, they also very, VERY often change in ways that are absolutely normal. Maybe you've changed a div to a section in your layout, suddenly your entire snapshot suite fails so you just mass-update them and call it a day.

Once that has happened a good hundred times, you slowly lose the habit of actually checking your snapshots. You see that they've changed, just update them, and job done. Which, of course, defeats their entire purpose. Suddenly your safety net has holes so big you're not sure it still even does anything, and inevitably, something much more nefarious than a div change passes through the mails of your test suite.

https://qanish.com/wp-content/uploads/2019/02/screenshot-difference-e1545051723765-1-656x300.png

Doubling Down

When I reached that point, I felt slightly betrayed by textual snapshots (even though they weren't at fault). The rationale reaction to that should have been "I need more actual tests instead of just snapshots". We already had a decent amount of behavioural tests (using React Testing Library), we were just not using nearly as much of them as we should have.

My reaction instead, was to double down on my own mistake. I naively thought "If textual snapshots failed me, maybe visual snapshots are the answer?". It made sense at the time: if the reason I didn't pay attention to text snapshots was because of invisible changes, then taking a screenshot of the actual page would mean snapshots would fail on actual visible problems. So I started looking into the subject, at the time we had an extensive Storybook of all our components so I decided to simply use the Puppeteer plugin to take a screenshot of every Storybook page and use that as visual snapshots.

But there's a reason textual snapshots took the spotlight off visual snapshots: the latter are much, much harder to pull off than I anticipated. The reality is, because you're comparing what pages or components look like once rendered, you're at the mercy of whichever rendering engine takes the snapshot, what OS it runs on, and so on. Which means when we started generating the screenshots locally on OSX, and comparing them to the ones the CI would generate (on Linux), they started failing. Because the way Linux renders fonts was very slightly different and that 1% difference had huge ramifications in the way the layout overflows, the way text wraps, and so on.

Because the solutions to that required an even bigger involvement time-wise and I was already too deep to back down, we decided to band-aid it by simply upping the failure threshold of snapshots. You can probably see where this is going: we had to up it so much to go past OSes rendering differences that in the end, the snapshots became once again ineffective at the task we had set them up for. Because small important changes would fly below the failure threshold, and big unimportant changes (changing the navigation bar for example) would fail the whole suite.

We also encountered some other difficulties setting this up, amongst other thing we had to make sure the screenshots were taken after asynchronous actions and loaders and such (because we were now in a real DOM instead of a shim of one).

Admitting defeat

Eventually the problem I mentioned (bugs passing through snapshots) happened and it hit me like a ton of bricks. I had no "Option C", no further plan, the only thing I was left with was a bag of my own bad decisions. I started talking a bit about the issue with colleagues but soon realized the more I talked, that the issue lied neither in textual nor visual snapshots, but in the way we had been writing tests.

If there is one takeaway from this article, let it be this one: snapshots are awesome. They may not always be easy to pull off, they may not solve all your problems, but they do solve a lot of problems and I now consider them an incredibly useful part of a healthy test suite. Not a required one, but a welcome one. But the emphasis is on healthy test suite, don't start by your snapshots, don't even necessarily add them all the time. But whenever you have to assert that large sets of data stay "put", whenever you want a visual gallery of all the pages in your application, whenever you think to yourself "I wish I could ensure this particular piece of logic/UI didn't change" then absolutely do use snapshots. They're here for that and they're pretty good at their job by now.

But always be certain that the weight of making sure your logic stays put doesn't lay on them. We have such better other tools for that (acceptance tests for example). And yes they do require more work, but ultimately they will save your ass in a million ways that snapshots could not.

All this to say: don't be like me, don't use a screwdriver to hammer in a nail.

Use a hammer.

© 2020 - Emma Fabre - About