Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I like slicing and dicing with awk, grep and friends too.

One thing I find odd that you have to drop to a full language (awk, perl etc) to sum a column of numbers. Am I missing a utility?

  echo "1\n2\n3\n" | sum # should print 6 with hyphothetical sum command
I suppose more generally you could have a 'fold initial op' and:

  echo "1\n2\n3\n4\n" | fold 0 +   # should print 10
  echo "1\n2\n3\n4\n" | fold 1 \*  # should print 24
But I guess at that point you're close enough to using awk/perk/whatever anyway. Which probably answers my question.


Just for grins:

    $ alias sum="xargs | tr ' ' '+' | bc"
    $ echo -e "1\n2\n3\n" | sum
    6


Here's one way to do this with the standard "dc" (RPN calculator) utility:

  echo "1\n2\n3\n+\n+\np\n" | dc -
Or, a little more legibly:

  > dc
  1
  2
  3
  +
  +
  p
  6
[Edited to fix bug]

Not sure how to automate this to sum 1000 values without needing to explicitly insert 999 + signs, though. Haven't explored dc in depth myself yet. There's probably some way to do it with a macro or something, but it may not be pretty.


If you consider the use of dc/bc as in the other solutions to be cheating, you can use unary-encoded integers...

   alias sum='xargs -I{} sh -c "head -c {} < /dev/zero" | wc -c'


Yeah I wrote my own sum utility in Python... the syntax is just sum 1 or sum 2 for the column, with a -d delimiter flag. In retrospect I guess it could have been a one line awk script. But yeah if you are doing this kind of data-processing, it makes sense to have a hg/git repo of aliases and tiny commands that you sync around from machine to machine. You shouldn't have to write the sum more than once.

Another useful one is "hist" which is sort | uniq -c | sort -n -r.


I've never aliased it, but yes I use your 'hist' a lot. Useful for things like "categorise log errors" etc.

Does everyone else edit command history, stacking up 'grep -v xxxx' in the pipeline to remove noise?

If I'm working on a new pipeline, my normal workflow is something like:

  head file     # See some representative lines
  head file | grep goodstuff
  head file | grep good stuff | grep -v badstuff
  head file | grep ... | grep ... | sed -e 's/cut out/bits/' -e 's/i dont/want/'
  head file | grep ... | grep ... | sed -e 's/cut out/bits/' -e 's/i dont/want/' | awk '{print $3}' # get a col
  head file | grep ... | grep ... | sed -e 's/cut out/bits/' -e 's/i dont/want/' | awk '{print $3}' | sort | uniq -c | sort -nr  # histogram as parent
Then I edit the 'head' into a 'cat' and handle the whole file. Basically all done with bash history editing (I'm a 'set -o vi' person for vi keybindings in bash, emacs is fine too :-)


Yeah, this is my quick-and-dirty way of looking at referers in Apache logs, built up from a few history edits. It excludes some bot-like stuff (many bots give a plus-prefixed URL in the user-agent string) and referer strings from my own domain, removes query strings, and cleans up trailing slashes:

   grep -v "+http" access_log | cut -d \" -f 4 | cut -d \? -f 1 | sed 's/\/$//' | grep -v kmjn.org | sort | uniq -c | sort -nr


> awk '{print $3}'

is the same as

> cut -f3 -d' '

cut is amazing for what it does. and most people know only the subset of awk that effectively _is_ cut anyway :D.


Not exactly, awk will consume all whitespace while cut will split on each individual space character, and not on newlines and tabs.


Bundling expressions into regular expressions can be handy, for example "grep -Ev '(thisbot|thatbot|bingbot|bongbot)'" instead many single grep pipes.


    echo "1\n2\n3\n" | tr '\n' + | bc


Doesn't work. By default, echo doesn't translate \n into a newline, so you have to add the -e flag. Then, bc doesn't like the extra plusses at the end, so you have to either add the -n flag to echo and remove the last \n, or somehow trim the newlines from the end beforehand.


Thanks for the detail. I was worried about the escapes in the echo, but didn't check.


You could do:

echo -e "1\n2\n3\n4" | paste -sd+ | bc

kind of cheating though! :-)


    paste -sd+|bc




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: