Dr2

Data release 2 discussion

Things to think about for dr2

[RMS 13 May 14] Do the files need to have processing history? This currently isn't implmemented in psrsh
[RMS 13 May 14] Dynamic spectra would be nice. These could easily be produced using psrchive and pulsar templates
[GH 10 Jan 14] Must work out how to deal with .psrchive.cfg and baseline widths properly (current pipeline does nothing)
[GH 10 Jan 14] Should we have any way of linking the processing/residuals to the original observing log file?

[MK 10 Jan 14] Reproducibility -- what does an outside user need to reproduce our work?

There are lots of corrections to be made to individual data files.
There are configurations that happen on a per-pulsar basis. E.g., what is the fraction of the pulse used to compute the baseline?
I propose to handle these by generating a psrsh script that corrects the headers / time offsets in memory. Then we never need to touch the raw data, but we can
- produce a corrected archive simply by saving to disk
- let anyone produce a corrected version simply by grabbing the raw data and the psrsh script. Thus the complexity of how we correct the files can stay internal.
The psrsh script would be itself be generated by another code we keep under version control, probably Python. It could either access the existing database of flags, or we could generate something new. I tend to favor something that's both human readable and parsable (e.g. YAML).

[GH 10 Jan 14]: a whole heap of comments on issues coming from the dr1:

How are we going to store the final results? What files will be made available? Will the files be public, given to the IPTA, or something else?
Provenance - everything needs to be reproducible. What are the implications for the pipeline?
Will we be making use of psrsh? If so, there seems to be a few issues and it doesn't implement everything we'll need. Who can update this software?
What pcm processing should we use? What about the issue that the engineers/Ron's test suggests low cross-coupling whereas PCM suggests very high levels? Do we care?
- [RMS 13 May 14]: We want to use the best calibration method. It would be good to keep track of polarisations parameters as a function of time. Is there evidence that they change?
The DFBs still exhibit over-polarisation whereas CASPSR did exhibit underpolarisation. Do we care? What are we going to do about it?
How do we properly deal with simultaneous observations with multiple backends in the same observing band?
Do we make use of the time delay measurements? If so do we use them as starting guesses or as absolute values?
Shall we develop a new method for flagging data or make use of the existing database of bad observations?
Should we use MTM for template matching?
Are we going to use the invariant interval again for 0437?
Some of the biggest challenges with dr1 were getting the jumps, deltaDM(t) and red noise models. We had to iterate multiple times to get these.
dr1 did not provide white noise models (e.g., EFAC/EQUAD)
How do we ensure that log files (from e.g., the pipeline processing) get seen by the users of the data?
Should we take account of TOASTER - this is a pipeline system set up by the EPTA?
Do we wish to completely time/frequency and polarisation scrunch or keep some subints/frequency channels?
How do we process the P140 data? For dr1e we just took Joris' .tim file. It would be good to reprocess the original data, but I think that a lot of it has been lost.

[WAC 9 Jan 14] Data Products - Vote Early and Vote Often

TOAs by subchannel and subint
- [MTK 10 Jan 14] what level of granularity would we want to keep here?
Dynamic Spectra

[GH: 11 Jan 14]

Why do we get strange outliers in the fluxcals from our current processing? Where do they come from and what are the implications?

[DB: 15 Jan 14]: version control, provenance

Under "Reproducibility" on 10 Jan 14, MK mentions version control for Python scripts etc. The PulsarVL Stash git project can be used if so desired.
In relation to GH's provenance point on 10 Jan 14, I spoke briefly with Nick Car just before the break about his PROV/PROMS provenance system.
- This may be of benefit to the pipeline work. Generating provenance metadata (RDF) from DAP is also a roadmap item.