This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the algorithms category.
Last Updated: 2024-10-12
e.g. I wanted to find the biggest files in the project_s repo. So I ran du
$ du project_s
192./.composer/cache/files/phpspec
16 ./.composer/cache/files/fideloper/proxy
1632 ./.composer/cache/files/maximebf
84496 ./.composer/cache/files
854328 ./.composer/cache/repo/https---repo.packagist.org
854328 ./.composer/cache/repo
938832 ./.composer/cache
938840 ./.composer
2810896 .
Next I sorted by piping into sort and sorting on first key
$ du project_s | sort -k 1
968 ./node_modules/jsdom/node_modules/acorn/dist
9696 ./node_modules/terser
971280 ./node_modules
976 ./node_modules/handlebars/dist/amd/handlebars/compiler
984 ./node_modules/array-includes/node_modules/es-abstract/2019
9912 ./node_modules/lodash
992 ./vendor/phpunit/phpunit/tests/end-to-end/regression
As you can see, the order was not what I expected because sort
expected
text not numbers. Therefore I had to tell sort to sort numerically with
sort -k 1 -n
$ du project_s | sort -k 1 -n
968 ./node_modules/jsdom/node_modules/acorn/dist
976 ./node_modules/handlebars/dist/amd/handlebars/compiler
984 ./node_modules/array-includes/node_modules/es-abstract/2019
992 ./vendor/phpunit/phpunit/tests/end-to-end/regression
9696 ./node_modules/terser
9912 ./node_modules/lodash
971280 ./node_modules
Note that this applies within vim too - e.g. ! sort -n
ep-1.mp4
ep-10.mp4
ep-12.mp4
ep-2.mp4
ep-25.mp4
ep-29.mp4
ep-3.mp4
ep-30.mp4
ep-36.mp4
ep-37.mp4
ep-38.mp4
ep-39.mp4
ep-4.mp4
ep-40.mp4
ep-5.mp4
ep-6.mp4
ep-7.mp4
The above results stayed the same with sort -n
.
And this has nothing to do with the prefixes - when removed and sorted with -n
, we get
1.mp4
10.mp4
12.mp4
2.mp4
25.mp4
29.mp4
The issue appears to be the number of digits differing. However this put 10 before 2. The right solution was sort -V
for version numbers
ep-1.mp4
ep-2.mp4
ep-3.mp4
ep-4.mp4
ep-5.mp4
ep-6.mp4
ep-7.mp4
ep-8.mp4
ep-9.mp4
ep-10.mp4
ep-12.mp4
ep-13.mp4
ep-14.mp4
ep-25.mp4
ep-26.mp4
From the docs: -V
will give this ordering:
sort-1.022.tgz
sort-1.23.tgz
sort-1.23.1.tgz
sort-1.024.tgz
sort-1.024.003.
sort-1.024.003.tgz
sort-1.024.07.tgz
sort-1.024.009.tgz
TLDR: For numerical data, -V
is most likely what is needed.
When I started capitalizing acronyms in the SemicolonAndSons website, I noticed in the code diary pages that titles like "CORS big picture" would come before "all you need to know about sessions" (i.e. links to entries with an uncapitalized first letter in the title). The issue was that the sort algorithm placed all capitalized letters before even a lowercase "a". The solution was to sort based on the result of capitalizing just the first letter (to create an even playing field)
entries.sort_by {|e| e[:name].upcase }