hacker news Hacker News
  1. new
  2. show
  3. ask
  4. jobs
Was sitting around in meetings yesterday and remembered an old shell script I had to count the number of unique lines in a file. Gave it a shot in rust and with a little bit of (over-engineering)™ I managed to get 25x throughput over the naive approach using coreutils as well as improve over some existing tools.

Some notes on the improvements:

1. using csv (serde) for writing leads to some big gains

2. arena allocation of incoming keys + storing references in the hashmap instead of storing owned values heavily reduced the number of allocations and improves cache efficiency (I'm guessing, I did not measure).

There are some regex functionalities and some table filtering built in as well.

happy hacking

loading...