In some respects the SAM/BAM specification is quite loose, in that there is more than one way to represent a given piece of information. We can take advantage of this to reduce the size on disk of mapped reads which match the reference sequence, while still maintaining conformance within the spec. I've written a SAM/BAM reference based compression script in Python - put this in your pipeline and smoke it!