HN Offline: Puppeteer Support for Firefox

Puppeteer Support for Firefox

cpeterso | 655 points | 11mon ago | hacks.mozilla.org

jesprenj|11mon ago

What I very dislike about current browser automation tools is that they all use TCP for connecting the browser with the manager program. This means that, unlike for UNIX domain sockets, filesystem permissions (user/group restrictions) cannot be used to protect the TCP socket, which opens the browser automation ecosystem to many attacks where 127.0.0.1 cannot be trusted (untrusted users on a shared host).

I have yet to see a browser automation tool that does not use localhost bound TCP sockets. Apart from that, most tools do not offer strong authentication -- a browser is spawned and it listens on a socket and when the controlling application connects to the browser management socket, no authentication is required by default, which creates hidden vulnerabilites.

While browser sessions may only be controlled by knowing their random UUIDs, creating new sessions is usually possible to anyone on 127.0.0.1.

I don't know really, it's quite possible I'm just spreading lies here, please correct me and expand on this topic a bit.

JoelEinbinder|11mon ago

You can set `pipe` to true in puppeteer (default false) here https://pptr.dev/api/puppeteer.launchoptions

By default, Playwright launches this way and you have to specifically enable the tcp listening.

jesprenj|11mon ago

Great, I stand corrected! I still don't know how they convince firefox/chromium to use a pipe as a websocket transport layer.

_heimdall|11mon ago

I have always wanted a browser automation tool that taps directly into the accessibility tree. Plenty do supporting querying based on accessibility features, but unless I'm mistaken none go directly to the same underlying accessibility tree used by screen readers and similar.

Happy to be wrong here if anyone can correct me. The idea of all tests confirming both functionality and accessibility in one go would be much nicer than testing against hard coded test IDs and separately writing a few a11y tests if I'm offered the time.

jahewson|11mon ago

It depends on what you’re testing. Much of a typical page is visual noise that is invisible to the accessibility tree but is often still something you’ll want tests for. It’s also not uncommon for accessible ui paths to differ from regular ones via invisible screen-reader only content, eg in a complex dropdown list. So you can end up with a situation where you test that accessible path works but not regular clicks!

If you really want gold standard screen reader testing, there’s no substitute for testing with actual screen readers. Each uses the accessibility tree in its own way. Remember also that each browser has its own accessibility tree.

_heimdall|11mon ago

Yeah those are interesting corner cases for sure.

When UI is only visual noise and has no impact on functionality, I don't see much value in automated testing for it. In my experience these cases are often related to animations and notoriously difficult to automate tests for anyway.

When UX diverges between UI and the accessibility tree, I'd really expect that to be the exception rather than the rule. There would need to be a way to test both in isolation, but when one use case diverges down two separate code paths it's begging for hard to find bugs and regressions.

Totally agree on testing with screen readers directly though. I can't count how many weird differences I've come across between Windows (IE or Edge) and Mac over the years. If I remember right, there was a proposed spec for unifying the accessibility tree and related APIs but I don't think it went anywhere yet.

regularfry|11mon ago

Guidepup looks like it's a decent stab in that direction: https://www.guidepup.dev/

Only Windows and MacOS though, which is a problem for build pipelines. I too would very much like the page descriptions and the accessibility inputs to be the primary way of driving a page. It would make accessible access the default, rather than something you have to argue for.

_heimdall|11mon ago

That's an interesting one, thanks!

Skimming through their getting started, I wonder how translations would be handled. It looks like the tests expect to validate what the actual screen reader says rather than just the tree, for example their first test shows finding the Guidepup header in their readme my waiting for the screen reader to say "Guidepup heading level 1".

If you need to test different languages, you'd have to match the phrasing used by each specific screen reader when reading the heading descriptor and text. All your tests are also actually vulnerable to any phrasing changes made to each screen reader. If VoiceOver changed something it could break all your test values.

I bet they could hide that behind abstractions though, `expectHeading("Guidepup", 1)` or similar. Ideally it really would just be a check in the tree though, avoiding any particular implementation of a screen reader all together.

Nextgrid|11mon ago

Spawn it in a dedicated network namespace (to contain the TCP socket and make it unreachable from any other namespace) and use `socat` to convert it to a UNIX socket.

jesprenj|11mon ago

This is not always possible as some machines don't support network namespaces, but it's a perfectly valid solution. But this solution is Linux-only, do BSD OSes like MacOS support UID and NET namespaces?

jgraham|11mon ago

There's an issue open for this on the WebDriver BiDi issue tracker.

We started with WebSockets because that supports more use cases (e.g. automating a remote device such as a mobile browser) and because building on the existing infrastructure makes specification easier.

It's also true that there are reasons to prefer other transports such as unix domain sockets when you have the browser and the client on the same machine. So my guess is that we're quite likely to add support for this to the specification (although of course there may be concerns I haven't considered that get raised during discussions).

bryanrasmussen|11mon ago

I haven't researched it but I would be surprised if Sikuli does this http://sikulix.com/

notpublic|11mon ago

run it inside podman/docker

yoavm|11mon ago

I know this isn't what the WebDriver BiDi protocol is for, but I feel like it's 90% there to being a protocol through which you can create browsers, with swappable engines. Gecko has gone a long way since Servo, and it's actually quite performant these days. The sad thing is that it's so much easier to create a Chromium-based browser than it is to create a Gecko based one. But with APIs for navigating, intercepting requests, reading the console, executing JS - why not just embed the thing, remove all the browser chrome around it, and let us create customized browsers?

djbusby|11mon ago

I have dreamed about a swappable engine.

Like, a wrapper that does my history and tabs and book marks - but let's me move from rendering in Chrome or Gecko or Servo or whatever.

sorenjan|11mon ago

There used to be an extension for Firefox called "IE Tab for Firefox" that used the IE rendering engine inside a Firefox tab, for sites that only worked in IE.

pauldino|11mon ago

And Google made basically the opposite thing to embed Chrome within Internet Explorer for sites that wouldn't work in IE.

https://en.wikipedia.org/wiki/Google_Chrome_Frame

hyzyla|11mon ago

The same idea with built in Internet Explorer in Microsoft Edge, where you can switch to Internet Explorer mode and open website that only correctly works in Internet Exlorer

joshuaissac|11mon ago

There are some browsers that support multiple rendering engines out of the box, like Maxthon (Blink + Trident) and Lunascape (Blink + Gecko + Trident).

apatheticonion|11mon ago

Agreed. Headless browser testing is a great example of a case where an embeddable browser engine "as a lib" would be immensely helpful.

JSDom in the Nodejs world offers a peak into what that might look like - though it is lacking a lot of browser functionality making it impractical for most use cases.