The 3D data exchange format discussion

in preparation to the 3rd OPTICON 3D Working Group Meeting,
25-26 June 2001, in Lyon

Discussion introduction by Jeremy Walsh (ESO)
... I think it is vital that we come up with a 3D Format from the Lyon meeting. Everyone may not agree with the final decision but if we are going to have a solid basis for the RTN and for true exchange of 3D data among ourselves we have to agree one format. This doesn't preclude using your own favourite format but as far as exchange of data is concerned and the production of common tools, there can only be one.
We all agree that FITS should be the basis of the pixel storage. There are then two alternatives:

3D FITS cube
Stacked spectra in 2D FITS image(s) and associated file to define the spectrum related quantities (e.g. position on sky, etc)

From the Potsdam meeting there are proponents of both.
I briefly review what formats are currently in use:

Tiger	Binary image (.tig) + FITS table of positions
Integral	2D FITS + ASCII fibre position list
VIMOS	3D FITS cube
SPIFFI	3D FITS cube
Teifu	Stacked spectra in 2D FITS + position table
GIRAFFE	3D FITS cube
PMAS	Stacked spectra in 2D FITS + FITS table of positions
CIRPASS	Stacked spectra in 2D Multi-Extension FITS + associated FITS table for position information

[See the minutes of the Potsdam meeting for more details.]
If a lossless transfer format is a necessity then 3D cubes will not be adequate to handle instruments with non-square pixels on the sky. If the stacked format + assocated description is chosen we must decide how to store the data associated with each spectrum - either as FITS tables or as cubes, the latter suggested by Yannick....

The 2D+cube format, described by Yannick Copin (LEI)

Comment by Bianca Garilli (MIL)
My starting point is Yannick's document just distributed (btw, thanks Yannick for the effort. It is very clear), where he makes an appreciable attempt to find a compromise between the 3D format and the 2D+T format. His 2D+C format, as far as I could understand, foresees an "image" with one spectrum per row, coupled with a 3D cube containing additional information (selection, total flux, etc)
This is a very nice approach, which I try to elaborate further. But first, let me address his "cons": as for points 1 & 2 (interpolation and reconstruction for hexagonal packing), these two are present no matter which storage method is considered. Even with the 2D+T format, in case one lens must be excluded, an interpolation is needed (or a hole left in the image). Just like resampling is required for hexagonal packing even if data are stored in 2D+T format. Thus I will not consider these two as cons against this particular storage method, but as unavoidable operations to be performed on some data
The third point (not regular grid and/or not complete sky coverage), is also something which cannot be solved by a storage method. In these cases, the real disadvantage I see in the 2D+C with respect to the 2D+T is some waste of disk space. On the other hand, let me point out something we should not neglect: disk space (and, also RAM) is cheap, man power is expensive. To define a data storage method which is efficient in terms of disk space for these two particularly unlucky (and hopefully rare) cases, at the cost of complicating ALL algorithms (and therefore requiring more man power) is not, in my opinion, a good choice.
As for the fourth point (libraries to manipulate) this is not COMPLETELY true and in any case there is one way out Both iraf (package ttools, part of tables/stsdas) and midas handle 3D tables In the CFITSIO library, though, I could not find anything about 3D tables, but on the FITS format definition document (cf. http://archive.stsci.edu/fits/fits_standard/ a multimensional array is mentioned. Thus, some tools do exist A possible way out, in any case, would be to make not a 3D "table" but a data cube (i.e. a 3D image). In this case, there are tools to handle the different image planes. It is just a matter of "way of thinking" after all.
But I would like to elaborate further on Yannicks propositioon. As he has showed a way to re-arrange in a cube the "additional information", I wonder why we should not adopt the same re-arrangement for spectra. Ie. go from a 2D+C format to a "cube plus cube" format. Of course, even in this case the "cons 1 and 2" of Yannicks are still applicable (but as mentioned before, this is true for any storage). Con # 4 is not true any more (there are tools to manipulate cubes), Yannick's Pros are still valid, and we have an added pro: a slice of the 3D spectral cube is already a slice in the lambda plane, and to build a lambda integrated image means simply to add pixels in the lambda direction (and this is a real simplification of our lives). In the case of square packing, we know it is straightforward, while in the hexagonal packing case, to have a meaningfull image we will have, as always, to "rebin" data in some way. But the data are stored WITHOUT binning, which is the real important point (thus Martin can really check wether a feature is real or is the result of a badly removed cosmic ray)
Last, but not least, I would recommend 2 things

no matter which is the final format (3d, 2d+t, 2d+c, cube+cube), I strongly recommend that for each observation, everything is stored in one fits file, making an extensive use of fits extensions.
extensions should have appropriate names, to be decided before-hand and common to everybody
and this is already planned, we must define clearly and at the best of our capabilities the k/w to be put in extension headers, sticking as much as possible to standard fits definitions

Finally, let me "cast my vote by e-mail, as suggested by Jeremy. My order of preferences is as follows

cube + cube
3D
2D+cube

and I am definetly against the 2D+T!

Comment by Lowell E. Tacconi-Garman (ESO)
I, too, cannot be in Lyon for next week's meeting owing to a schedule conflict with our institute's annual Konzil meeting. But let me add my two cents (Euro cents, of course) by simply saying, "AMEN, Bianca!" I agree with every point she made.

Comment by Martin M. Roth (AIP)
As for the facts, Bianca, thanks for your comments: the C+C is exactly what I was about to throw in. At any rate, I have the feeling that thanks to Yannicks considerations we are on the right track. If we implicitely retain the individual spectra with no need for immediate spatial rebinning, reconciling also the considerations concerned with standard FITS I/O, we are in principle done. The nice thing would also be that true data cubes with orthogonal spatial sampling would form a natural subset, while irregularily spaced data "cubes" will need rebinning for further processing at some stage anyway, but still do not suffer loss of information.

Comment by Roland Bacon (CRAL)
Until know 3D instrument have been limited to a relatively small number of pixels. A step ahead will be done with VIMOS but it will still be manageable. In the frame of the 2nd generation of VLT instrumentation we are proposing a new IFS that would have a very large number of pixels: ie 90000 spectra of 2048 pixels, covering a 1'x1'@0.2". For such a VERY big datacube, it is important to have some flexibility to adress either spectra or monochromatic images without too much loss of efficiency. It is important to keep that in mind, while thinking to the data format. The grow in size for IFS datacube is very probable, and it is important to be prospective while defining the 3D format.
Another item, is that it is also important to have a companion/associated datacube with the noise variance estimate. It is required if we want to optimally co-add a series of datacubes.

21-Jun-2001 |

The 3D data exchange format discussion

in preparation to the 3rd OPTICON 3D Working Group Meeting, 25-26 June 2001, in Lyon

in preparation to the 3rd OPTICON 3D Working Group Meeting,
25-26 June 2001, in Lyon