The longest word you can type on the first row

neilk · 2023-08-25T21:05:51

I think we can do better than 11 characters!

I happen to have a corpus which includes pretty much every word ever written in a book, including many misspelled, mistranscribed, or otherwise non-dictionary words.

After eliminating nonsense, non-English, or other mistakes, I think the real winner, coming it at 12 characters, is:

    teetertotter

That's a relatively common word. Even though it's usually seen hyphenated, the unhyphenated form is recognized by all the online dictionaries I found.

----

And some other candidates, just for fun, in the 13 or 12 character range:

    proproprietor
    priorityqueue
    reporterette
    preprototype

"proproprietor" seems more like a misspelling. Should have a hyphen, or be two words.

"priorityqueue" is of course familiar to hackers here, but is more of a jargon term, and is only concatenated due to appearing in source code. Invariably it's two words when actually written out.

"reporterette" is antique, but appeared in a NYTimes headline as late as 2018 - the author reflected on her career, including sexist epithets. https://www.nytimes.com/2018/12/02/opinion/george-hw-bush-ma...

"preprototype" is used exactly as is, in lots of scientific papers, up to the current day. That's a pretty good one too, and could be a tie for "teetertotter", but it's verging on jargon.

soultrees · 2023-08-26T06:30:09

How did you scrape that data? How do you store and retrieve it? Is it just a standard db or a vector db?

Sorry for the questions, but it seems like an interesting, yet probably common data set and as someone who is venturing down this path, I’d like to learn more about building my own dataset similar to this from scratch.

neilk · 2023-08-26T17:59:08

> standard db or vector db

lol, it's a 42MB text file from Google Books Ngrams.

The format looks like this:

    $ head words-all.txt

    a       14219615690
    a!      196012
    a"      84
    a'      47713
    a'0     3036
    a'1     4070
    a'10    99
    a'11    56

I queried it with perl and sort.

    $ time perl -wlane 'if ($F[0] =~ /^[qwertyuiop]+$/) { print length($F[0]), "\t", $F[0] }' words-all.txt | sort -rn > qwertywords

    real 0m1.915s
    user 0m1.896s
    sys 0m0.025s

I can't remember exactly which file I downloaded, but according to my notes I got it from here back in 2012 or so.

https://storage.googleapis.com/books/ngrams/books/datasetsv2...

There seems to be a newer corpus published in 2020:

https://storage.googleapis.com/books/ngrams/books/datasetsv3...

bnjmn · 2023-08-24T21:49:38

On any macOS computer (or replace /usr/share/dict/words with your own word list):

  grep '^[qwertyuiop]*$' /usr/share/dict/words | \
  awk '{ print length(), $0 }' | \
  sort -n

juujian · 2023-08-24T21:57:27

Works for Ubuntu, too. My Colemak self can only get fluffy (6) from the front row, that's the longest word. Middle row really shines though, I can get hardheartedness (15) or assassinations (14).

tiltowait · 2023-08-24T22:16:14

Interesting that your dictionary doesn't have "tenderheartedness", which is two letters longer.

jmholla · 2023-08-26T04:43:47

"tenderheartedness" uses every row, not one row.

cbsks · 2023-08-27T03:34:45

Not on a Colmak keyboard

seabass-labrax · 2023-08-24T23:00:03

Gulp, fluffy puppy pug! Yup. Fly, ugly pup, fly.

I note that hardheartedness and hotheadedness threaten the darnedest nonstandard assassinations. Such sordidness!

travisgriggs · 2023-08-24T22:39:36

Nice.

Middle/Second row result is

8 flagfall "Flagfall, or flag fall, is a common Australian expression for a fixed start fee, especially in the taxi, haulage, railway, and toll road industries."

8 galagala "A name in the Philippine Islands of Dammara Philippinensis, a coniferous tree yielding dammar-resin."

Lower/Third Row: - None

There are no vowels on the bottom row. So no words. I've been typing at ~ 50wpm for 30 years, and I don't think I'd ever actually consciously recognized this fact about the bottom row.

(standard US keyboard layout)

JoshTriplett · 2023-08-24T23:55:07

For QWERTY, I found two nine-letter words using only the middle row: halakhahs and haggadahs.

And yeah, nothing in the bottom row other than acronyms and similar pseudo-words.

Symbiote · 2023-08-25T23:01:58

Dvorak:

  ',.PY FGCRL   pry or Lyly
  AOEUI DHTNS   tendentiousness
  ;QJKX BMWVZ   xxxv, www, bbq or mm

After 'apt install wbritish-insane'

  pyrryl (a chemical group)
  unostentatiousnesses (and anaesthetisations is good too)
  mmmm

tedunangst · 2023-08-24T23:38:26

Knuth vs McIlroy all over again.

IshKebab · 2023-08-25T22:07:11

Just use https://www.visca.com/regexdict/

susam · 2023-08-24T23:17:43

On macOS version 12.1 Monterey:

  $ grep '^[qwertyuiop]*$' /usr/share/dict/words | awk '{print length, $0}' | sort -rn | head
  11 rupturewort
  11 proterotype
  11 proprietory
  10 typewriter
  10 tetterwort
  10 repetitory
  10 repertoire
  10 proprietor
  10 pretorture
  10 prerequire

On Debian GNU/Linux 11 (bullseye):

  $ grep '^[qwertyuiop]*$' /usr/share/dict/words | awk '{print length, $0}' | sort -rn | head
  10 typewriter
  10 repertoire
  10 proprietor
  10 perpetuity
  9 typewrote
  9 typewrite
  9 territory
  9 repertory
  9 puppeteer
  9 prototype

jodrellblank · 2023-08-25T23:15:20

Dyalog APL, using the enable1 wordlist, I don't know its origins but you can get it from Peter Norvig's website https://norvig.com/ngrams/enable1.txt or various GitHubs and Gists:

          ↑7↑{⍵[⍒≢¨⍵]}words/⍨{''≡⍵~'qwertyuiop'}¨words
    ┌→─────────┐
    ↓peppertree│
    │perpetuity│
    │prerequire│
    │proprietor│
    │repertoire│
    │typewriter│
    │etiquette │
    └──────────┘

Reading from the right, "test each word by removing 'qwertyuiop' and see if it leaves an empty string, use the test results to filter the input word list, descending-sort the length of each word and use that to arrange(index) the remaining words, flatten the array and take the top 7".

(Longest from the middle row is 'haggadahs' then 'alfalfas', third row is 'mm')

JoshTriplett · 2023-08-24T23:52:43

For Debian, try installing one of the larger wordlists, such as wamerican-huge or wbritish-huge; those have "rupturewort".

codetrotter · 2023-08-24T23:47:51

FreeBSD 13.2

    % grep '^[qwertyuiop]*$' /usr/share/dict/words | awk '{print length, $0}' | sort -rn | head

    11 rupturewort
    11 proterotype
    11 proprietory
    10 typewriter
    10 tetterwort
    10 repetitory
    10 repertoire
    10 proprietor
    10 pretorture
    10 prerequire

So it seems that in addition to having parts of its kernel based on FreeBSD, there is also a lot of similarities in the wordlist at /usr/share/dict/words of macOS to that of FreeBSD :) perhaps even the same?

p1mrx · 2023-08-25T20:46:39

MS-DOS 6.22

    C:\>grep '^[qwertyuiop]*$' /usr/share/dict/words | awk '{print length, $0}' | sort -rn | head
    Bad command or file name
    Bad command or file name
    SORT: Too many parameters
    Bad command or file name

jodrellblank · 2023-08-26T03:19:52

MS-DOS 6.22 (excluding any typos as I rewrote it, I only did a proof of concept with a few words but it seemed to work).

    @echo off
    mkdir c:\t
    echo prompt echo %1 $g C:\t\%1 >> c:\temp.bat
    command /c temp.bat %1 > c:\h.bat
    del c:\temp.bat
    
    type words.txt | find /V "a" | find /V "s" | find /V "d" | find /V "f" | find /V "g" | find /V "h" | find /V "j" | find /V "k" | find /V "l" | find /V "z" | find /V "x" | find /V "c" | find /V "v" | find /V "b" | find /V "n" | find /V "m" > c:\words2.txt
    
    echo >  h.bas DIM word as STRING
    echo >> h.bas OPEN "C:\words2.txt" for INPUT as #1
    echo >> h.bas DO WHILE NOT EOF(1)
    echo >> h.bas  INPUT #1, word
    echo >> h.bas  CMD$ = "C:\h.bat " + word
    echo >> h.bas  SHELL CMD$
    echo >> h.bas LOOP
    echo >> h.bas CLOSE #1
    echo >> h.bas SYSTEM
    
    qbasic /RUN c:\h.bas
    
    dir /OS /B c:\t
    
    @del c:\t\*
    @rmdir c:\t
    @del c:\words2.txt
    @del c:\h.bat
    @del c:\h.bas

You can play with MS-DOS 6.22 in a virtual-machine-in-browser here[1]. That VM comes with Vim (non-standard) so use Vim or Edit to create a word list and save as c:\words.txt. Then yype all this code into a batch file using `edit run.bat` and then run it with `run %1`. MS-DOS 6.22 came with QBASIC so I think that's allowed; I tried to avoid it but wasn't able to. NB. DOS is way less capable than Windows cmd prompt so there's no `for /f` or anything. "Dir /OS /B" sorts files by size and that view will leave the largest files on screen as the answer. The files will be one per word, containing the word so the size in bytes is the word length and the filename is the word to see it in the file listing. The words will be echoed into the files by a helper batch file containing `echo %1 > %1`. Building the helper batch file is hard because echo cannot echo > into a file. The qwerty filtering is a chain of `find /V "a"` for excluding each of the other rows cough. I then couldn't loop over the file lines without QBASIC.

[1] https://copy.sh/v86/?profile=msdos

If you never used MS-DOS classic, try "edit test.txt" and see how it has a nice TUI, where Alt+F brings up the File menu, the brightly coloured letters are the hotkeys, so Alt+F, X will quit. Shift+Down will select a line, Shift+Delete to cut and Shift+Insert to paste. Ctrl+left/right arrows to jump forward/back a word, Ctrl+Shift+Left/Right to select a word. 29 years later those keyboard patterns still work in this FireFox editor, in current notepad, WordPad and Word, and in my muscle memory. Escape tends to exit back out of popups and menus. Quit and try "help date" and see the TUI help, where the green angle brackets are hyperlinks and can be TAB'ed between, Enter to activate and Escape back. F1 is still the help key, only it actually showed offline help back then instead of doing a Bing search for 'get help in notepad'. Quit and run QBasic, see how F5 runs the code.

[2] by Geoff Cutter: https://groups.google.com/g/alt.msdos.batch/c/Ozg2C-ANCqI

[3] Phil Robyn's QBASIC loop sample https://groups.google.com/g/alt.msdos.batch/c/44NbdZJ2-p4/m/...

kazinator · 2023-08-28T06:14:21

Awk greps!

   awk '/^[qwertyuiop]+$/ {print length, $0}'

kristopolous · 2023-08-24T23:50:59

Here's something you may not know, the *-insane dictionaries, which are giant, are functions of OCR output and are known to contain lots of errors.

I found a few earlier this year and I was going to file a bug so I did some research to find out this is a known and expected behavior.

If the computer say reads stubborn as stubbum, the smaller dictionaries are the ones that have cross checked and filtered those out. The insane ones do not. It's a good name. "Lack of sanity checks"

Here's an example word I found, "suabilities". You'll find it only on wordlist sites that used this wordlist and I guess, now here.

colinchartier · 2023-08-25T20:29:28

Reminds me of the ghost Unicode character saga: https://www.dampfkraft.com/ghost-characters.html

kristopolous · 2023-08-28T02:48:42

just saw this. I've got no idea how kanji ocr works but I do know enough japanese to know what most of those characters are attempting to refer to, my penmanship has certainly been that bad. I still don't understand how it would make its way into the standard unless that part wasn't written by someone who is competent in japanese.

I wonder how often that happens - surely there's tons of people dealing with japanese text who can't read it and just use diligence to make sure the "letters are the same"

schoen · 2023-08-25T20:27:35

I've used the insane dictionaries a number of times for puzzle stuff and I never knew that they were derived from OCR output. Thanks for mentioning that!

seabass-labrax · 2023-08-24T23:04:27

You might find the... 'translation'[1] of Genesis 1 using only keys on the Colemak home row interesting:

  In the start The One has risen the stars and the earth.

  The earth had no order, and nothin' resided there; and shade resided on the nonendin' 'neath. And The One rided on the seas.

  Then The One said: "I desire it to shine"; and it shone.

  And The One had seen the shine, that it's neat; and The One sorted the shine on one side, and the shade on the other.

  The One then denoted the shine and the shade. So the nite and the shine that are date no. one had ended.

[1]: https://colemak.com/Fun

schoen · 2023-08-25T20:25:59

If you enjoy that, you might also enjoy this version that I wrote

https://godexperiment.org/beginnings-an-alliterative-rewrite...

It was inspired by these versions:

https://llamasandmystegosaurus.blogspot.com/2017/05/alpha.ht...

https://calvinballing.github.io/saga/

SethTro · 2023-08-24T21:49:45

For Dvorak with a little assist from unix

First row

$ awk '/^[,.pyfgcrl]$/ { print length(), $0 }' /usr/share/dict/words | sort -nr | head

3 pry / ply / fry / cry

Second row

$ awk '/^[aoeuidhtns]$/ { print length(), $0 }' /usr/share/dict/words | sort -nr | head

15 tendentiousness

14 assassinations

13 instantaneous

13 insidiousness

Third row

$ awk '/^[;qjkxbmwvz]*$/ { print length(), $0 }' /usr/share/dict/words | sort -nr | head

4 xxxv

3 xxx

3 xxv

2 xx

rwl4 · 2023-08-25T20:53:53

Hmm. My Mac shows these:

[...] 15 sententiousness 15 sinuatodentated 15 soundheadedness 15 tendentiousness 15 uninitiatedness 16 antisensuousness 16 ostentatiousness 17 dissentaneousness 17 instantaneousness 18 unostentatiousness

Nekhrimah · 2023-08-24T22:29:49

Not sure about that third row, the "A" is in the second row.

SethTro · 2023-08-24T22:33:46

Awkward, now the third now doesn't return anything

$ awk '/^[;qjkxbmwvz]*$/ { print length(), $0 }' /usr/share/dict/words | sort -nr | head 4 xxxv 3 xxx 3 xxv 2 xx 2 xv

Nekhrimah · 2023-08-24T22:45:11

With those letters, not surprised!

lovehashbrowns · 2023-08-24T23:28:09

I tried to do some other fun things like going row by row with each row only contributing one letter and seeing what’s the longest word I could come up with.

If I start at the top row and go down, I can make TAXES but couldn’t think of a longer word. The third row having no vowels makes it so hard.

Starting at the bottom row and going up, I came up with CHICKEN which is delicious and neat that it ends where it started. Chickens is longer but ends on the middle row which is not as neat I feel like :(

JoshTriplett · 2023-08-24T23:50:13

> If I start at the top row and go down, I can make TAXES but couldn’t think of a longer word. The third row having no vowels makes it so hard.

A dictionary search turned up "paxwaxes" as the longest word I could find that starts in the top row and goes down, wrapping around to the top every three letters.

> Starting at the bottom row and going up, I came up with CHICKEN which is delicious and neat that it ends where it started. Chickens is longer but ends on the middle row which is not as neat I feel like :(

Chickens is indeed the longest.

If you start at the bottom row and go up-and-down: cataclysms, or catamarans.

If you start at the top row and go down-and-up: escapable.

If you start in the middle and go down-and-up: scarabaean

If you start in the middle and go up-and-down, I didn't find anything longer than 7 letters, and there were 39 seven-letter words, including "discard", "grandpa", and "stacked".

DylanDmitri · 2023-08-24T21:45:05

Related, is there a high quality plaintext dictionary file for running similar searches? I’ve spent several hours but couldn’t find one that’s both comprehensive and accurate.

aidenn0 · 2023-08-24T21:54:31

What are your rules for what counts as a "word"? If you go with the basic scrabble rules (i.e. nothing that would be capitalized or punctuated) then YAWL[1] is pretty good, with the downside being the most recent version I know of is from 2008.

FYI, rupturewort is the sole 11-letter word answer to TFA in YAWL; found using:

    grep '^[qwertyuiop]*$' word.list |while read -r line; do echo "${#line} ${line}"; done |sort -n | tail

1: https://github.com/elasticdog/yawl

mminer237 · 2023-08-24T21:49:10

https://github.com/dwyl/english-words/blob/master/words_alph...

layer8 · 2023-08-24T23:11:45

https://packages.debian.org/bookworm/wordlist

jodrellblank · 2023-08-26T03:21:44

I linked in another comment, I use "enable1.txt" which is here on Peter Norvig's site: https://norvig.com/ngrams/enable1.txt

It's 170k English words, no placenames or people's names or anything like that, but does have some that I question how valid they are.

praash · 2023-08-24T21:54:50

Some common Linux distributions have packages that provide word list files to /usr/share/dict/ in several languages. It's likely for English files to be preinstalled. I've had a plenty of fun practising regex and pipes with these word lists!

koolba · 2023-08-24T21:49:05

Dictionary or word list?

/usr/share/dict/words is always destination zero for words.

JoshTriplett · 2023-08-24T23:51:11

I'd recommend the SCOWL wordlist, which also has usage data (so you can decide how rare of words you want to include).

suzzer99 · 2023-08-24T21:46:02

The Welsh would love this game. Trywytypryrwy

jodrellblank · 2023-08-26T04:06:31

OpenStreetMap can be queried with a regex, so here's places in Wales where the Welsh name is only the top row: https://overpass-turbo.eu/?w=(name:cy~/^[qwertyuiop]*$/i)%20...

There's four: a hamlet named Treopert, a mountain range in Snowdonia named Eryri, a house in Bangor University named after that mountain range, and TUI outdoor shop.

Curiously the longest top-row places I can find anywhere on OpenStreetMap are almost all roads in France:

    Rue Pierre Ropert
    Rue Pierre Riquet (x2)
    Rue Pierre Poutot
    Rue Pierre Potier (x4)
    Rue Pierre Perret (x3)
    Rue Pierre Pietri
    Route Petit Peyre (x2)
    Poirier Pierrotte
    Petite Rue Pierre
    Rue Pierre Perrier (x3)
    Rue Pierre Pottier (x3)
    Rue Pierre Routier
    Rue Poirier Piquet
    Rue Pouyer Quertier (x2)
    Route Pourpre et Or
    Tyrepower Port Pirie       <-- a shop in Australia
    Petite Route Petite Rue

aidenn0 · 2023-08-24T21:59:15

One does wonder who dared Liam Dutton to do this: https://www.youtube.com/watch?v=fHxO0UdpoxM

bitwize · 2023-08-24T21:42:34

Germans would type on a QWERTZ keyboard (Z and Y are swapped). This may theoretically considerably open up the space of possible top-row-only German words, as Z is very common in German (especially after T).

allendoerfer · 2023-08-24T23:23:49

Which is exactly why the two letters are swapped.

jedberg · 2023-08-25T20:48:40

> I imagine German has some epic words that can be written in just the first row.

My understanding of German grammar is that words can be of infinite length since they allow unlimited compounding. But they also have a language authority that makes words official, and the current longest is 68 letters.

So it would indeed be an interesting exercise in German.

tauchunfall · 2023-08-26T02:49:24

What is the name of this language authority? I researched and found "Council for German Orthography" and "Gesellschaft für deutsche Sprache" (Association for the German Language).

Duden (the standard German dictionary) is the closest I know. And they list "Aufmerksamkeitsdefizit-Hyperaktivitätsstörung" (the German word for attention deficit hyperactivity disorder) with 44 letters as the longest in the dictionary [1].

*Update:* There are also [2] and [3] but they are both not anymore part of a law (the "-gesetz" suffix in the word) or regulation (the "-verordnung" suffix in the word), respectively.

[1] https://www.duden.de/sprachwissen/sprachratgeber/Die-langste... [2] https://en.wiktionary.org/wiki/Rindfleischetikettierungs%C3%... [3] https://en.wiktionary.org/wiki/Grundst%C3%BCcksverkehrsgeneh...

eitally · 2023-08-24T22:41:44

A number of years ago I solved a minor but repetitive QoL problem I had, and created a password I could type with just my left hand. It started as 8 characters, but I now have variants with as many as 15 characters. Not a word, or even words strung together, but it is so nice being able to just type it with one hand.

nbush · 2023-08-25T00:00:20

This post finally got me to dig back up the ultimate word trivia website, valiantly hosted on Tripod and still maintained: https://jeff560.tripod.com/words1.html

noxvilleza · 2023-08-24T22:55:15

I'm surprised that PROPRIETORY wasn't found in the Andrew Stephens 2009 search.

svat · 2023-08-25T13:17:40

Most dictionaries list only the standard form PROPRIETARY, so it is arguable whether a word list should contain "PROPRIETORY": although Merriam-Webster and Wiktionary (unlike most dictionaries) list it as an alternative spelling, and it does occur a few times in the wild (e.g. see https://books.google.com/ngrams/graph?content=proprietory%2C... for a comparison), it is not surprising that most word lists leave it out. (Or, even if it occurs in a word list, Stephens may have excluded it, and there is sufficient justification for doing so.)

contingencies · 2023-08-25T21:42:56

  $ axel -n4 https://dumps.wikimedia.org/enwiktionary/latest/enwiktionary-latest-pages-articles.xml.bz2 && bunzip2 enwiktionary-latest-pages-articles.xml.bz2 # 1.1G -> 8.1G
  $ grep '<title>' enwiktionary-latest-pages-articles.xml  >titles # 282M
  $ sed 's/ *<title>//' titles |sed 's/<\/title>//' >clean-titles # 126M
  $ grep '^[qwertyuiop]*$' clean-titles | awk '{ print length(), $0 }' | sort -nr|grep ^1[23456789]

  15 retroproiettori  # italian <-- joint italian winner
  15 retroproiettore  # italian <-- joint italian winner
  13 topprioriteit    # dutch   <-- dutch winner
  13 ripitturerete    # italian
  13 retorqueretur    # latin   <-- joint latin winner
  13 repropitietur    # latin   <-- joint latin winner
  13 purpurroterer    # german  <-- german winner
  13 proterreretur    # latin   <-- joint latin winner
  13 perterreretur    # latin   <-- joint latin winner
  13 perquireretur    # latin   <-- joint latin winner
  12 teetertotter     # english <-- english winner
  12 riproporrete     # italian
  12 ripitturerei     # italian
  12 riotturerete     # italian
  12 retorquetote     # latin
  12 retorquerere     # latin
  12 requiriertet     # german
  12 requireretur     # latin
  12 repropitiere     # latin
  12 repperiretur     # latin
  12 purpurrotere     # german
  12 prototypique     # french  <-- french winner
  12 proterruerit     # latin
  12 proterrituro     # latin
  12 proterrituri     # latin
  12 proterriture     # latin
  12 proterretote     # latin
  12 proterrerere     # latin
  12 protereretur     # latin
  12 proriperetur     # latin
  12 proietterete     # italian
  12 priorytetowy     # polish  <-- joint polish winner
  12 priorytetowo     # polish  <-- joint polish winner
  12 prioriteetti     # finnish <-- finnish winner
  12 prepotettero     # italian
  12 portretterte     # bokmål  <-- joint bokmal winner
  12 portretterer     # bokmål  <-- joint bokmal winner
  12 portretteert     # dutch
  12 piroetterete     # italian
  12 pipettiertet     # german
  12 perterruerit     # latin
  12 perterrituro     # latin
  12 perterrituri     # latin
  12 perterriture     # latin
  12 perterretote     # latin
  12 perterrerere     # latin
  12 perquiritote     # latin
  12 perquirerere     # latin
  12 perpetuerete     # italian
  12 perpetrerete     # italian
  12 perpeteretur     # latin
  12 perequitetur     # latin
  12 iperprotetto     # italian
  12 iperprotetti     # italian
  12 iperprotette     # italian
  12 eqqoqqortooq     # greenlandic <-- greenlandic winner

yzydserd · 2023-08-26T07:59:50

I found myself mildly annoyed that the author calls QWERTY the “first row” of letters not the “top row”. If there was a “first”, I might nominate ZXCVBNM. Is QWERTY commonly known as the “first row”?

jodrellblank · 2023-08-25T23:17:42

Pop quiz, what's the somewhat-related significance of these longest words?

    canvasbacks

    counterconvention

    photofluorographies

I_complete_me · 2023-08-26T09:03:30

I like your username so here's my answer:

    canvasbacks omits letters from the top row
    counterconvention omits letters from the middle row
    photofluorographies omits letters from the bottom row

Presumably the longest word available in each case?

jodrellblank · 2023-08-26T13:18:37

Yes indeed! Longest word you can type without the top row (middle row, bottom row), from my wordlist.

probably_wrong · 2023-08-24T23:14:29

If you are interested in full sentences there's a "What if?" [1] that explains how to generate sentences using a single row of your keyboard (with a link to code [2]) or stranger stuff like "We reserved seats at a secret Starcraft fest".

[1] https://what-if.xkcd.com/75/

[2] https://xkcd.com/markov.py.txt

kragen · 2023-08-24T22:18:15

for this kind of thing, aside from of course /usr/share/dict/words or /usr/share/dict/spanish, i commonly use a word list sorted by occurrences in the british national corpus which i keep at http://canonical.org/~kragen/sw/wordlist

this allows you to, among other things, tune the comprehensiveness/accuracy tradeoff to your liking for a particular task by cutting the list off at a given point

probably i should download the google 1-grams now that i have a bigger disk

anyway

   $ grep ' [qwertyuiop]*$' ~/wordlist | perl -lane 'print length $F[1], " $F[1]"' | sort -n

indeed suggests only

    10 perpetuity
    10 proprietor
    10 repertoire
    10 typewriter

possibly a more practical problem is, what are the most common words you can type entirely with the left hand while your other hand is on the mouse†; 'redraw' was a significant one with early versions of autocad

    $ grep ' [qwertasdfgzxcvb]*$' ~/wordlist | head
    2150885 a
    923975 was
    664780 be
    478178 at
    470949 are

uh that's somewhat boring

    $ grep ' [qwertasdfgzxcvb]*$' ~/wordlist | perl -ane 'print "$F[1] " if length $F[1] > 6' | fmt | head
    greater started effects address average regarded created treated affected
    greatest streets readers afterwards referred database degrees dressed
    arrested attract decades attracted addressed stressed stewart awarded
    targets estates abstract assessed terrace reserve reverse barbara reserves
    careers defeated dragged grabbed creates greeted secrets bastard extract
    edwards detected deserve affects traders adverse retreat addresses rewards
    aggregate decrease scattered breasts referee databases deserted actress
    debates deserves reversed deserved defects reserved fastest rewarded
    steward battered erected deceased exaggerated cassette exceeded servers
    warfare extracts drawers stresses asserted reacted sweater regards stabbed

that's a bit better. how about words you can type alternating the two hands, so you can type faster

    $ egrep ' [^qwertasdfgzxcvb]?([qwertasdfgzxcvb][^qwertasdfgzxcvb])*[qwertasdfgzxcvb]?$' ~/wordlist | perl -ane 'print "$F[1] " if length $F[1] > 6' | fmt | head -4
    problem problems england chairman alright element ancient visible penalty
    quantity visitor signals amendment claudia bicycle authentic antibody
    malaysia naughty dickens entitlement antique paisley rituals auditor
    endowment blanche chairmen siemens chaotic suspend uruguay mcleish

it's a curious experience to touch-type a sequence of these words because after a while you notice that something is unusual. the one-handed words are a bit more conspicuous. try typing 'a better career award as edward created database facts after we agreed we stared at dear steve' into the comment box, it's super weird

of course the bnc has some built-in biases which dramatically understate the frequency of certain words

    $ grep fuck ~/wordlist
    2568 fucking
    1236 fuck
    158 fucked
    53 fucker
    23 fucks
    20 fuckers
    13 motherfucker
    5 motherfuckers
    5 fuckin

'ngl' appears but only 5 times. 'yeet' doesn't occur at all

______

† mouse?

kragen · 2023-08-26T19:54:10

out of curiosity today i reimplemented this

   $ grep ' [qwertyuiop]*$' ~/wordlist | perl -lane 'print length $F[1], " $F[1]"' | sort -n

in common lisp

    (defconstant *wordlist*                 
      (with-open-file (in #P"~/wordlist")   
        (loop for line = (read-line in nil) 
              while line                    
              for p = (position #\space line :from-end t)  
              collect (list (parse-integer (subseq line 0 p)) 
                            (subseq line (1+ p))))))          

    (defconstant *topwords*
      (let ((w (loop for (freq word) in *wordlist*  
                     if (loop for c across word always (find c "qwertyuiop")) 
                     collect word)))
        (sort w #'< :key #'length)))

those last four lines of code seem very acceptable to me but not really competitive with the unix approach for interactive experimentation

http://canonical.org/~kragen/sw/dev3/handwords.lisp

lispm · 2023-08-26T23:08:40

The standard says that WHILE can't appear before FOR in LOOP.

(parse-integer (subseq line 0 p)) is shorter (parse-integer "12345678" :end 3)

ben0x539 · 2023-08-24T22:52:05

I wonder how fast you could get at typing everything onehanded with some modifier key that mirrors the keyboard. Right hand is probably more useful there.

jodrellblank · 2023-08-26T04:51:42

Douglas Englebart's invention of the mouse, and Mother of All Demos in 1968 had mouse for the right hand, chording keyboard for the left hand. Seen here: https://youtu.be/UhpTiWyVa6k?t=1949

(It's still amazing to watch him explain that they don't look at the mouse while moving it, they look at the pointer).

Symbiote · 2023-08-25T23:13:16

I have seen that as an input device for a one-handed person.

I don't know how it compares to left-handed (or right-handed) Dvorak.

https://en.wikipedia.org/wiki/One_hand_typing

https://en.wikipedia.org/wiki/Dvorak_keyboard_layout#One-han...

kragen · 2023-08-25T00:26:01

i did try that by patching my x server 21 years ago

http://web.archive.org/web/20090727071756/http://lists.canon...

my hack was kind of half-assed but worked well enough that i was convinced that it was a bad idea, so i decided not to do the second buttock

ben0x539 · 2023-08-25T05:53:06

That's sick, thanks for sharing!

kragen · 2023-08-25T17:03:51

thanks i think

bombcar · 2023-08-24T23:03:41

That’s getting close to a chording keyboard which can br very fast.

kragen · 2023-08-25T01:01:50

it's sort of the opposite

the critical path for fast typing is the precision with which you can synchronize the motions of different fingers, and in particular different hands; if keydown events happen in the wrong order, you start to get tranpsosition erorrs

chording keyboards don't care in which order the keys in the chord start; they only care what the set of keys in the chord is and when it ends (so they can stop looking for new keys to add to the set). fast typists on chording keyboards can do 300 words per minute, which is about 7 chords per second, which is about as many "strokes" as a normal typist on a non-chording keyboard, just with chords (producing a syllable each) instead of individual keys (producing a letter each)

adding a required modifier key that has to happen before a keystroke, and end before the next keystroke, is the opposite; it adds more things you have to sequence correctly. in the half-keyboard patch case, in particular, it adds 50% more things. this slows down your typing by about a third

elchief · 2023-08-24T21:38:28

how about on the left side of the keyboard? STEWARDESSES is the longest i can think of

sshine · 2023-08-24T22:02:13

  exaggerated
  stewardesse

colecut · 2023-08-24T21:40:26

extra points for only using 2 rows

qup · 2023-08-24T21:47:58

Considering the keys, maybe you should get extra points if you use all three instead.

mensetmanusman · 2023-08-24T22:25:20

It is lollipop for those interested in the longest for the right hand.

san-fran · 2023-08-24T22:45:43

I don’t know if it’s true, but read that “stewardesses” is the longest for the left hand

kragen · 2023-08-24T22:27:18

polyphony and homophony are both one letter longer

jimbob45 · 2023-08-25T21:43:50

Am I the only one who types 'y's with my left hand?

kragen · 2023-08-25T22:28:39

i probably do it more often than i'd like to admit but officially...