2 Techniques - bash and awk explained

note: may not be optimal or best practice.. still in training..

2.1 pipe detection

Is output of script being piped?

if [ -t 1 ] ; then nopipe_out=terminal; fi;

nopipe_out will be set to terminal if the script output is not being sent to a pipe.

This means the output will go to the terminal and I need to be aware of line wrapping wrapping. (handled in xviewread.awk)

if flag ‘-t’ = 1 then output is going to terminal:

https://www.gnu.org/software/bash/manual/html_node/Bash-Conditional-Expressions.html

::::::

2.2 how many arguments passed into script?

if [[ $# -ne 1 ]]; then
  echo "message...."
  exit 2
fi

$# is number of arguments passed to the script. As we need the read number, one argument is requried.

If this is not present a message will be displayed, and the script will exit. ‘exit 2’ is used with the 2 indicating an error occurred in running the script.

https://www.gnu.org/software/bash/manual/html_node/Exit-Status.html

All builtins return an exit status of 2 to indicate incorrect usage, generally invalid options or missing arguments. ( i guess this is not a built in, but it does indicate a missing argument)

::::::

2.3 Check if a command/program is avaiable

command -v samtools >/dev/null 2>&1 || { echo >&2 "------ Is samtools loaded? error running samtools, exiting."; exit 1; }

Is samtools (or some other program responding)

command -v : The -v option causes a single word indicating the command or file name used to invoke command to be displayed,

if samtools is loaded then the output is :

[]$ command -v samtools
/apps/skl/software/SAMtools/1.8/bin/samtools

and $?, the exit code is

[]$ echo $?
0

if samtools is not loaded, nothing is output:

[]$ command -v samtools
[]$

and $? the exit code is

$ echo $?
1

:::::

2.4 assign ouput of a command to a variable, reference an input argument

bamdata=$(samtools view sam.lnk | sed "${1}q;d")

${1} is the read number we are looking for (sequential numbered reads in bam file )

sed "${1}q;d" 

=> sed 28q;d One of the fastest ways to get a requested line for text from a file or pipe.

:::::

2.5 Run a command and put the output into a variable.

read -r readdataD <<< "$(echo "$bamdata" | awk '{print $1}')"

read : Reads a single line from the standard input readdataD : is a bash variable the data will be read into <<< redirect output into read command (bash builtin)

$bamdata contains a line from a file, awk print $1 returns the first field in the line.

-r option : Backslash does not act as an escape character. The backslash is considered to be part of the line. In particular, a backslash-newline pair may not be used as a line continuation.

[n]<<< word The word undergoes tilde expansion, parameter and variable expansion, command substitution, arithmetic expansion, and quote removal. Pathname expansion and word splitting are not performed. The result is supplied as a single string, with a newline appended, to the command on its standard input (or file descriptor n if n is specified). https://www.gnu.org/software/bash/manual/html_node/Redirections.html#Here-Strings

though this should work also :

 readdataD="$(echo "$bamdata" | awk '{print $1}')"

:::::

2.6 strings in bash bash variable data

length

ciglen="${#cigarD}"

if cigarD=“ABCDEFGHIJKLMN” ciglen will be set to 14

substring:

firstFive="${cigarD:0:5}"
lastFive-"${cigarD:$(($ciglen-20)):20}"

firstFive is set to ABCDE lastFive is set to JKLMN

2.7 misc

Calling awk from bash:

awk -v cigA="$cigarD" -f cigtoRefLen.awk

-v will assign the bash variable cigarD to the awk variable cigA to be available in the awk script

-f specify awk script file to run

AWK: NR>2{…} tells awk to process the commands in curly braces only if the line number, NR, is greater than two. This has the effect of skipping over the header lines.