Post by demonview in Lil Programming Questions

Viewing post in Lil Programming Questions

I have this Lil code to print a table of 10 English words that remain words when the 'r's are removed from the word:

words:wods:still:0
each w in "" drop "\n" split read["words"] # /usr/share/dict/words"]
  words[w]:1
  if "r" in w
    wods["" fuse "r" drop w]:w
  end
end
each v k in wods
  if k in words
    still[v]:k
  end
end
show[select word:value wod:wods@value from random[still -10]]

which produces output like

+--------------+---------------+
| word         | wod           |
+--------------+---------------+
| "evaluated"  | "revaluated"  |
| "emigated"   | "emigrated"   |
| "boated"     | "borated"     |
| "estated"    | "restated"    |
| "expatiated" | "expatriated" |
| "elatedness" | "relatedness" |
| "pedated"    | "predated"    |
| "ungated"    | "ungrated"    |
| "pated"      | "prated"      |
| "gated"      | "grated"      |
+--------------+---------------+

if reading a 'words' file that contains only the 3k words that include "ated", this runs in 11s (lila.awk) or 16s (lilt). So, actually reading /usr/share/dict/words with its half a million words, is pretty infeasible.

Is there a much better way to do this in Lil, or is this sort of batch processing out of scope for Lil?

Oh, I had a second question about selecting from an array using an array of booleans, but I found that 'select' can do that. Which gives this code that runs in 22ms in lilt, instead of 16s:

words:extract value where 5<count@value from "\n" split read["words"]
still:select word:value wod:(on f x do "" fuse "r" drop x end)@value
    where (on f x do ("" fuse "r" drop x) in words end)@value
    where (on f x do "r" in x end)@value
  from words
show[table random[still -10]]

but this needs 4s to process only 50k words, so the full dictionary's still out.

Internet Janitor282 days ago (1 edit) (+1)

Hmm. Could be tricky to make this fast in Lil.

In general, using queries is much more efficient than loops. You can slightly simplify

where (on f x do "r" in x end)@value

where value like "*r*"

and you could hoist that "in" out of the loop, since it accepts a list as a left argument. Together, these ideas can make the query much more concise:

on strip x do "" fuse "r" drop x end
still:select word:value wod:strip@value
 where (strip@value) in words
 where value like "*r*"
 from words

...but probably not much faster.

You can also avoid stripping the r's twice by using a "subquery"

still:select where wod in words from
 select word:value wod:strip@value
 where value like "*r*"
 from words

demonview282 days ago(+2)

wods:"\n" split "r" drop words:read["/usr/share/dict/words"]
words:"\n" split words
still:select word:words@index wod:value
  where (wods in words)*(extract value like "*r*" from words)
  from wods
show[table random[still -10]]

this gets an answer in 13min, or 880ms without the 'wod in words' test. I've tried a few alternatives (words dict 1, readdeck of a grid, parsing and reading a json of a table) and nothing seems to cut that down. It seems like building large tables is slow. Maybe due to the allocator?

But, this was only for learning purposes and I gained a better appreciation of the query language from it.

Internet Janitor279 days ago(+2)

I made some localized improvements to C-Lil's implementation of the "in" operator. Using this dictionary file:

https://github.com/dwyl/english-words/blob/master/words_alpha.txt

and this version of the entire script:

words:extract where 5<count@value from "\n" split read["words_alpha.txt"]
on strip x do "" fuse "r" drop x end
still:select where wod in words from
 select word:value wod:strip@value
 where value like "*r*" from words
show[table random[still -10]]

The patch brings execution time on my laptop from about 12 minutes (oof) to about half a second.

itch.io

Viewing post in Lil Programming Questions