I like slicing and dicing with awk, grep and friends too. One thing I find odd t...

andrus · on July 15, 2013

Just for grins:

    $ alias sum="xargs | tr ' ' '+' | bc"
    $ echo -e "1\n2\n3\n" | sum
    6

gnosis · on July 15, 2013

Here's one way to do this with the standard "dc" (RPN calculator) utility:

  echo "1\n2\n3\n+\n+\np\n" | dc -

Or, a little more legibly:

[Edited to fix bug]

Not sure how to automate this to sum 1000 values without needing to explicitly insert 999 + signs, though. Haven't explored dc in depth myself yet. There's probably some way to do it with a macro or something, but it may not be pretty.

mjn · on July 15, 2013

If you consider the use of dc/bc as in the other solutions to be cheating, you can use unary-encoded integers...

   alias sum='xargs -I{} sh -c "head -c {} < /dev/zero" | wc -c'

chubot · on July 15, 2013

Yeah I wrote my own sum utility in Python... the syntax is just sum 1 or sum 2 for the column, with a -d delimiter flag. In retrospect I guess it could have been a one line awk script. But yeah if you are doing this kind of data-processing, it makes sense to have a hg/git repo of aliases and tiny commands that you sync around from machine to machine. You shouldn't have to write the sum more than once.

Another useful one is "hist" which is sort | uniq -c | sort -n -r.

jbert · on July 15, 2013

I've never aliased it, but yes I use your 'hist' a lot. Useful for things like "categorise log errors" etc.

Does everyone else edit command history, stacking up 'grep -v xxxx' in the pipeline to remove noise?

If I'm working on a new pipeline, my normal workflow is something like:

  head file     # See some representative lines
  head file | grep goodstuff
  head file | grep good stuff | grep -v badstuff
  head file | grep ... | grep ... | sed -e 's/cut out/bits/' -e 's/i dont/want/'
  head file | grep ... | grep ... | sed -e 's/cut out/bits/' -e 's/i dont/want/' | awk '{print $3}' # get a col
  head file | grep ... | grep ... | sed -e 's/cut out/bits/' -e 's/i dont/want/' | awk '{print $3}' | sort | uniq -c | sort -nr  # histogram as parent

Then I edit the 'head' into a 'cat' and handle the whole file. Basically all done with bash history editing (I'm a 'set -o vi' person for vi keybindings in bash, emacs is fine too :-)

mjn · on July 15, 2013

Yeah, this is my quick-and-dirty way of looking at referers in Apache logs, built up from a few history edits. It excludes some bot-like stuff (many bots give a plus-prefixed URL in the user-agent string) and referer strings from my own domain, removes query strings, and cleans up trailing slashes:

   grep -v "+http" access_log | cut -d \" -f 4 | cut -d \? -f 1 | sed 's/\/$//' | grep -v kmjn.org | sort | uniq -c | sort -nr

ibotty · on July 15, 2013

> awk '{print $3}'

is the same as

> cut -f3 -d' '

cut is amazing for what it does. and most people know only the subset of awk that effectively _is_ cut anyway :D.

toupeira · on July 15, 2013

Not exactly, awk will consume all whitespace while cut will split on each individual space character, and not on newlines and tabs.

Sprint · on July 16, 2013

Bundling expressions into regular expressions can be handy, for example "grep -Ev '(thisbot|thatbot|bingbot|bongbot)'" instead many single grep pipes.

atondwal · on July 15, 2013

    echo "1\n2\n3\n" | tr '\n' + | bc

FreeFull · on July 16, 2013

Doesn't work. By default, echo doesn't translate \n into a newline, so you have to add the -e flag. Then, bc doesn't like the extra plusses at the end, so you have to either add the -n flag to echo and remove the last \n, or somehow trim the newlines from the end beforehand.

jbert · on July 16, 2013

Thanks for the detail. I was worried about the escapes in the echo, but didn't check.

phaemon · on July 15, 2013

You could do:

echo -e "1\n2\n3\n4" | paste -sd+ | bc

kind of cheating though! :-)

whacker · on July 15, 2013

    paste -sd+|bc