Friday, September 16, 2005

Within sorted files, determine if gap exists

Re: Within sorted files, determine if gap exists

Hi Try this,
katiyar@/export/home/katiyar/test> cat sorted_file
HRND.A050401000097OCC_177.raw.747.OCA
HRND.A050401010001OCC_177.raw.748.OCA
HRND.A050401020005OCC_177.raw.749.OCA
HRND.A050401030006OCC_177.raw.750.OCA
HRND.A050401040007OCC_177.raw.751.OCA
HRND.A050401050008OCC_177.raw.752.OCA
HRND.A050401060009OCC_177.raw.753.OCA
HRND.A050401070010OCC_177.raw.754.OCA
HRND.A050401080011OCC_177.raw.755.OCA
HRND.A050401090012OCC_177.raw.756.OCA
HRND.A050401100013OCC_177.raw.757.OCA
HRND.A050401110014OCC_177.raw.758.OCA
HRND.A050401130016OCC_177.raw.760.OCA
HRND.A050401140017OCC_177.raw.761.OCA
HRND.A050401150019OCC_177.raw.762.OCA
HRND.A050401160020OCC_177.raw.763.OCA
katiyar@/export/home/katiyar/test> cat script.sh
next=2
prev=1
lines=`wc -l sorted_file`
while [ $next -ne $lines ]
do
val1=`eval sed -n \'$prev p\' sorted_file|cut -d. -f4`
val2=`eval sed -n \'$next p\' sorted_file|cut -d. -f4`
val1=`expr $val1 + 1`
if [ $val1 -ne $val2 ];then
echo "Missing file number : $val1"
fi
prev=$next
next=`expr $next + 1`
done
katiyar@/export/home/katiyar/test> ./script.sh
Missing file number : 759
katiyar@/export/home/katiyar/test>

Hope it helps,

Hi Vram,
Actually I am not very comfortable with awk :)) .....now answer to your Q
regarding '$prev p'. Inorder to print a particular line of a file say 5th
thru sed we write
sed '5 p' filename
Similarly I have the line number stored in prev variable and want to use it
in sed. but since $ has a special meaning in sed I need to evaluate $prev
first and then pass that value to sed. An 'eval' before sed does this for
me. Hope I am clear. Feel free to revert back if still there are any doubts.
-n option is used to avoid printing of line twice.
Hope it helps

Re: Within sorted files, determine if gap exists

In awk it would look something like this:

awk -F. '{curr=$4}NR!=1&&prev+1!=curr{print "Gap Between: "prev"
"curr}{prev=$4}'

The output from your example input would be:

Gap Between: 758 760

Brian

- No " after curr. the quotes balance now. 1 set around "Gap Between:" and
one set around a space between prev and curr.

I haven't tested this, but this _might_ work where the numbers wrap:

awk -F. '{curr=$4; if (curr0){curr=1000}}NR!=1&&prev+1!=curr
>
> {print "Gap Between: "prev" "curr}{prev=curr00}'

If your file is sorted, then won't 000 come first instead of after 999?

Brian

To print out the entire filename change to:

{print "Gap Between: "prev" "curr"at line: "$0}

That will print the line AFTER the gap.

Brian

0 Comments:

Post a Comment

<< Home