How to Get Started with Tree-Sitter

avodonosov · on May 28, 2023

Anyone knows, why github code navigation, which is based on tree-sitter [1], supports only subset of languages for which tree-sitter has parsers [2]? For example, Common Lisp is not supported. I asked github [3], but they are silent.

1 - the "search based" navigation is based on tree sitter https://docs.github.com/en/repositories/working-with-files/u...

2 - the list of parsers in the official docs: https://tree-sitter.github.io/tree-sitter/

3 - https://github.com/orgs/community/discussions/55704

euiq · on May 28, 2023

While we're talking about this, I've been quite disappointed by the new GitHub file browser: the new features look neat, but the core file browsing experience is significantly worse for me because the new UI is much more complicated and every action seems to do way more work.

I couldn't immediately find a way to opt out; I wouldn't mind the new file browser so much if I could drop into it on demand instead of having it forced down my throat.

eyelidlessness · on May 29, 2023

I’ve had few (some, but minimal) problems with the new file tree browser, but the actual individual file view has become a minefield. It hijacks find in page, presumably to enhance it with semantic relations, but it breaks frequently when navigating. Previous searches (cmd-G) will fall back to the browser’s native search. If you’re lucky, you searched the same term in both, and if you’re luckier still the search term is in the top N lines of code (because the code itself is lazy loaded I guess?). Most of the time I’m not lucky and I have to start over each time I view a different file. Even though it feels more like a “single page app” than it previously did, I lose all of that state every time I do anything.

I get what they’re trying to do, and in some ways it’s clearly beta growing pains, but in other ways it’s clear they’re fighting the browser environment in ways that don’t and probably shouldn’t work.

donio · on May 29, 2023

I ended up blocking keyboard events from a userscript to deal with the keyboard hijacking. This combined with opting out of all the "feature previews" has made it more or less usable again for now. But the writing is on the wall, I am not using GH for any new projects.

throwaway290 · on May 28, 2023

It was an optional opt-in labs feature that I tried for an hour then turned off, but now they force it.

The new UI is more busy for no good reason. Sidebar makes sense in a local IDE, but over web every time you expand a subtree it takes ages anyway. And because it merely exists it is distracting and makes pages load slower...

gurjeet · on May 28, 2023

Click on your profile in the top-right, and then in the resulting menu, click on 'Feature preview'. You might be able to disable it from there.

euiq · on May 29, 2023

It’s not there anymore, presumably because it graduated from beta earlier this month: <https://github.blog/changelog/2023-05-08-the-new-code-search...>

cstrahan · on May 28, 2023

I believe they use semantic (Haskell program that uses tree-sitter) for navigation: https://github.com/github/semantic

So the answer may be that semantic does not yet have support for the language in question.

ghuntley · on May 29, 2023

correct. adding support for new language is a matter of opening up a pull-request with the appropriate gluecode. https://github.com/github/semantic/blob/793a876ae45d38a6bd17...

avodonosov · on May 29, 2023

It would be more realistic to think of pull requests if github allowed per-repo configs with custom parsing scripts, similar to gitLab LSIF support - https://docs.gitlab.com/ee/user/project/code_intelligence.ht.... Otherwise, how can a contrioutor be sure that his pull request really produces convenient code navigation in github UI. There is no way to test it.

avodonosov · on May 29, 2023

The Semantic seems not used anymore:

> During this time, the team migrated from a Semantic-based tagging service to one that operated entirely with (newly developed) Tree-sitter queries. Though Semantic performed well, using the Tree-sitter query language allowed faster iteration and avoided the operational overhead of a program-analysis framework.

https://queue.acm.org/detail.cfm?id=3487022

avodonosov · on May 29, 2023

Update: https://github.com/orgs/community/discussions/55704#discussi...

kfir · on May 28, 2023

I believe this paper has an answer to your question - https://queue.acm.org/detail.cfm?id=3487022

avodonosov · on May 28, 2023

Thank you very much for the link.

I don't see an answer in there, though.

They use tagging (as described also here: https://tree-sitter.github.io/tree-sitter/code-navigation-sy...). From both docs I assume `tree-sitter tags` works out of box for any language that has a parser. (Since neither doc instructs to save a custom config for tag extraction query, I assume every language plugin provides tagging queries).

awayto · on May 28, 2023

I worked with tree sitter a bit for a small project. It seems to be the case that, while it covers a large set of languages, the actual implementation of each subset doesn't follow the exact same underlying API. And it follows that there is no universal AST parser as far as I can tell. Languages have many types of atomic parts to such an extent that unifying them is a tremendously complex task.

avodonosov · on May 28, 2023

Have you used the `tree-sitter tags` command?

awayto · on May 28, 2023

Unfortunately not. My stint of use was focused around dynamically loading in any given tree-sitter-langaugeName package via docker and then parsing any given file type in a runtime environment, in order to coalesce differing languages' parts into a unified metadata structure.

So, I wasn't trying to identify named things, like the tags command might seem to do, but more generic parts of the langauges like function, var, import/export statements, etc. I ultimately made my own walk functionality, but pulling out those atomic parts per language is still a very heavy task (basically need a mini lexer for each lang), even though TS seems to provide all the info to do the task.

ghuntley · on May 29, 2023

ah, easy. it's because support has not been added into https://github.com/github/semantic which is the tech that powers the GitHub UI. Adding support is pretty easy/mainly glue code [1] that imports the tree sitter API.

[1] https://github.com/github/semantic/blob/793a876ae45d38a6bd17...

xvilka · on May 29, 2023

In my case, Tree-Sitter grammar for C has many shortcomings (ignore preprocessor bugs, they can't be solved in the current TS model anyway): https://github.com/tree-sitter/tree-sitter-c/issues

nektro · on May 28, 2023

title should include "in Emacs"

gumby · on May 28, 2023

Given how powerful Emacs is and how important it has been for my computing over the past four decades, I think it would be more useful to me for people to label all non-emacs articles [Not Emacs]

bitwize · on May 29, 2023

Emacs is relatively obscure now. It is most practical to assume that all editor-related articles are about Visual Studio Code unless indicated otherwise, since a majority of developers use that. Vim if the article concerns in-terminal editor use.

0x457 · on May 29, 2023

sar·casm

noun: sarcasm; plural noun: sarcasms

the use of irony to mock or convey contempt.

"his voice, hardened by sarcasm, could not hide his resentment"

toadi · on May 29, 2023

his voice hardened... I didn't hear that in the comment. Maybe /s at the end of the comment would have helped missing out on the body language or verbal queues.

/s

pcstl · on May 28, 2023

I think adding "in Emacs" to the title of an article published on Mastering Emacs Dot Org might be a bit redundant.

wafflemaker · on May 28, 2023

Scrolled over the title (thought it was some programming algorithm concept) and then by chance I've read the site name saying it's about Emacs which prompted me to open the discussion.

Maybe on some setups the site name is not as easily visible as the title?

layer8 · on May 29, 2023

What’s visible on the HN page is “masteringemacs”, which is just nonobvious enough to parse (“master in <whatever>”) that it doesn’t register.

agumonkey · on May 28, 2023

some times a bit of redundancy doesn't hurt

mekster · on May 28, 2023

Is everyone supposed to comprehend the domain name of the post?

nico · on May 28, 2023

There is a very cool gif demo at the bottom showing contextual editing with multiple cursors using an extension

Some pictures/video or gif would have been really nice at the top of the article to get a quick idea of what I’ll be able to get by following the guide

mickeyp · on May 28, 2023

Good shout. I've added another example image near the top, instead of burying the lede for people wondering what it's about.

nico · on May 28, 2023

Amazing, thank you

nanna · on May 29, 2023

Mickey Peterson's writing is like that of an explorer charting wildernesses that only a few have yet had the capacities to visit.

gHA5 · on May 29, 2023

I used Tree-sitter to navigate and extract syntactic information from source code files in multiple programming languages. It was great that I didn't have to use multiple parser (generator) libraries.

hifikuno · on May 29, 2023

Focusing on the Tree-Sitter side, it's something I've been meaning to look into. We have a lot of Oracle SQL files that make up our ETL pipeline at work, and I would love to be able to visualize the flow of the data. I know there are some products out there but our ETL was written from scratch without any tooling besides Oracle SQL and bash scripts (and cron).

I feel like a well written Tree-Sitter grammar should allow me to parse the files and follow the data from source to dashboard.

nerdponx · on May 29, 2023

I believe there is a grammar for SQL, but it probably doesn't support a lot of the vendor-specific language extensions you might be using. You could end up writing your own grammar or heavily extending any grammar that already exists. There might already be libraries specific to parsing Oracle SQL into a syntax tree, at which point Tree Sitter isn't really adding value.

That said, the ability to query for things like "all SELECT statements where table 'xyz' is referenced in any table identifier" is very powerful.

hifikuno · on June 4, 2023

That's a good idea, that would save me starting from scratch.

We do use a fair bit of vendor-specific language, would be cool to make a good Oracle version and then commit back to the project.

z3t4 · on May 28, 2023

I take it that many people like the idea with tree-sitter but don't really know how to use it. Tree-sitter does have documentation but it's not that useful.

It would be interesting to know how many sales you get from a niche book like this though.

alwaysbeconsing · on May 28, 2023

There's not much end-user documentation the Tree-sitter project can really provide. It's a programming interface. The stuff that an Emacs, NeoVim, ... user is actually interacting with depends on the integration in the editor. So it's up to those docs to explain what features they expose based on Tree-sitter.

For example, the multiple cursor thing at the end is enabled by having the concrete syntax tree from Tree-sitter's parse, but Tree-sitter has nothing whatsoever to do with the cursors: it just provides locations in the text based on particular queries (like "all `identifier` nodes named 'foo' that are sub-nodes of this other node").

Even the headliner feature, syntax highlighting, isn't provided directly by Tree-sitter. It's up to the client system to inspect the syntax tree and apply attributes to its rendered text -- however it does that rendering.

nerdponx · on May 29, 2023

The query language and CLI tools are absolutely user-facing features on their own. But even if they aren't, developers still need to be able to learn from the docs. Currently that's very difficult to do. My experience is that you have to rely heavily on copying other people's examples in order to make progress.

alwaysbeconsing · on May 29, 2023

Point taken about the developer documentation. But the query language is only user-facing if the tool you're using exposes it. And the CLI only if you need to build your own grammars; I expect that to be less and less common as time goes on.

nerdponx · on May 29, 2023

Maybe this is a "blind spot" of sorts to the developers of Tree Sitter, but I find that the CLI is directly useful for end users to perform queries on their code. Maybe it would be better if we had dedicated tools that were more oriented at end users rather than developers of TS grammars, but it seems silly not to use a perfectly good tool that already exists.

Somewhat of an aside about building grammars, I have found that the grammars are relatively hard to make small modifications or extensions to. Whereas it's relatively easy with e.g. traditional Vim syntax highlighting. Making grammars easier to extend would be valuable for users of languages like SQL that have a zillion custom dialects.

alwaysbeconsing · on May 30, 2023

> I find that the CLI is directly useful for end users to perform queries on their code

That's interesting; what do they use the results for?

> I have found that the grammars are relatively hard to make small modifications or extensions to

Definitely true. I think this may be an inherent problem of the system and the parser generation, though. I am not sure it's solvable by the grammar authors.

nerdponx · on May 30, 2023

I use it for code search. I can search for the definition of a given function, or all places that a particular function is called by name.

pimeys · on May 28, 2023

Helix editor has it built-in so it's a good place to start looking into tree-sitter. For me the important parts are:

- Much better (and faster) syntax hilighting

- Shrink and expand selection (select variable, all variables inside parentheses, the whole block, the whole function)

- Select next or previous sibling node

- Goto matching bracket from where I am right now

- Jump between functions

- Jump between type definitions

- Jump between parameters

- Jump between comments

- Jump between tests

I think Helix must be the easiest way to start experimenting with tree-sitter. You need no plugins, just install the editor and start experimenting:

https://helix-editor.com/

catgoose · on May 29, 2023

pfft I can do that with 105 plugins in neovim

ninepoints · on May 28, 2023

In what way is its documentation not useful? I did a project based on TreeSitter a year ago and the docs were perfectly clear and adequate, and I was up and running within half an hour or so

z3t4 · on May 29, 2023

For example the Query Syntax, I would like an example on how to find all function declarations and where in the code the function starts and end. As well as example data that such a query is supposed to return. And how to do it in the JavaScript/Web Tree-sitter binding =)

ossusermivami · on May 29, 2023

i am in awe how emacs is able to reinvent itself again and again, (along with vim/neovim)..

audiodude · on May 28, 2023

This is so hopeless complicated that I'm glad I use VSCode and not Emacs anymore. Sheesh.

pridkett · on May 28, 2023

I use both VS Code and Emacs. One of the things I love about Emacs is that if you put enough work into it, it will be like nothing else. There are times when I’m writing code, a text file, or even managing git commits with magit that I can sit back and say “wow, this is genuinely a pleasure to use.”

But, it takes a lot of work to get there. I still don’t have everything working super well, debugging is way easier for me in VS Code. But, I’m still learning (after 25+ years as an Emacs user), and that brings me joy.

It’s like when I’m working on electronics. There’s a genuine joy I get from using my Hakko soldering station, Mitutoyo calipers, or my Engineer hand tools. Using something that is supremely well designed for a purpose brings me joy.

And org-mode. Seriously, org-mode.

okasaki · on May 28, 2023

OK? It's feature in development. Eventually emacs 29 with tree sitter will be part of distro packaging, and it will just be an "apt install emacs" away.

audiodude · on May 28, 2023

That's not the impression I got from TFA. It sounds like, once you have Emacs compiled with tree-sitter, you also need to have a language binding in a shared library in a known location (that you have to compile yourself or depend on the kindness of strangers). And once you have that, you actually need someone to write a major mode that utilizes it at all. And if you want to make that major mode the default for that file type, there are a few other things you need to do. Etc.

natrys · on May 29, 2023

> once you have Emacs compiled with tree-sitter

Once Emacs 29 releases, your distro will package Emacs compiled with tree-sitter.

> you also need to have a language binding in a shared library in a known location

Emacs already ships with command that clones grammar repo, compiles, and installs the shared library to that known location - this was explained in the article. The only manual thing you need to do is to associate a language with a git repo in your configuration.

> And once you have that, you actually need someone to write a major mode that utilizes it at all.

Emacs developers have been also been working on covering major languages to provide tree-sitter based major modes. I count 23 major modes already being maintained as part of Emacs that will be shipped soon as part of 29.1, not to mention there are a lot more in Melpa (centralised community package repository).

In any case, whole thing with the tree-sitter is that it really makes writing major modes easy. It's all declarative now, including indentation rules that had traditionally been tricky to get right.

slondr · on May 29, 2023

I use emacs 29 beta right now and that is just not how it works. For example for elixir you just install elixir-ts-mode like any other package, answer yes to the prompt to install the elixir ts library, and you’re done.

Even that setup step will be unnecessary in emacs 30, when all this stuff will be shipped by default.

audiodude · on May 29, 2023

Okay, that's a good thing! From the article:

> Neither tree-sitter nor Emacs come installed with language grammars

> ...it’ll only work if you don’t have an exceptional setup (so it won’t work well unless you have GCC and run some flavor of Linux.)

> Determining if a grammar is available is not intuitive nor obvious unless you use elisp

> Note that, just because you have installed a grammar, does not mean Emacs supports it. Someone still has to write the – admittedly, way easier – syntax and indentation logic and all that good stuff.

> Annoyingly, there’s no easy way to see if you’re using the normal or the TS-powered major mode

> If you use Customize, then you don’t have to do anything, but if you normally use setq, you’ll have to use customize-set-variable instead to ensure the setter is called properly.

> That sounds like a great idea until you realize that it is not possible to make one-size-fits all commands that do this. Believe me: I’ve tried.

foobarbaz33 · on May 28, 2023

Does VS code even integrate tree sitter? I'd imagine leveraging tree sitter is even more hairy there.

_pvxk · on May 29, 2023

Not yet. If anyone manages to read through all of https://github.com/Microsoft/vscode/issues/50140 they might find out if it's planned or not; Mickey's article is shorter =P

audiodude · on May 29, 2023

I guess the implication was that VS Code works well and I don't need tree-sitter?

ar_lan · on May 29, 2023

Imagine not even desiring to understand the tools that enable you to do your job.

d357r0y3r · on May 28, 2023

It feels like people use these editors for the novelty of it and not because it's truly the best for productivity.

snapdaddy · on May 28, 2023

I will treat your comment as serious and explain why I personally use emacs.

The thing I love most about emacs is that it is joyously consistent. The same key combination to jump ahead by a word works everywhere, such as when you're opening files or navigating directories. This consistency means that I am often very efficient trying new packages that I have never used.

Maybe you've never seen someone use emacs in anger? If so, check out this video of Steve Yegge doing some stuff in Emacs: https://youtu.be/lkIicfzPBys?t=142

distantsounds · on May 29, 2023

please point me towards literally _any_ text editor that dynamically remaps shortcuts on you.

okasaki · on May 28, 2023

Ah yes, the novelty of using emacs, an editor that started in 1976.

chlorion · on May 29, 2023

It's definitely just a fad that will die out any day now!

lmm · on May 29, 2023

Well, the nature of emacs is that you can always add a line to your config file and have it radically change everything.

Barrin92 · on May 28, 2023

fact aside that I don't think I've ever heard emacs associated with novelty (the thing is as old as Stonehenge), I wish this productivity cult would die. Programming is fun and play. Discovery and experimentation are great things.