Документ взят из кэша поисковой машины. Адрес оригинального документа : http://hea.iki.rssi.ru/conf/hea2007/bp-0.2.97/NOTES
Дата изменения: Mon Dec 16 03:54:50 1996
Дата индексирования: Mon Oct 1 23:47:27 2012
Кодировка:
Поисковые слова: m 63

bp -- bibliography package for Perl
Dana Jacobsen (dana@acm.org)
12 March 1996

bp.pl is the main file that gets included. It will include all the other
files as needed. Since this package was started before Perl version 5, I
wrote dynamic loading routines myself using the require command. This is
not as spiffy as Perl 5's dynamic loading, but works just fine, and allows
Perl 4 to be used. The variable $glb_bpprefix in the main package controls
the prefix used for the additional files. This is "bp-" right now. If you
want to change this, go ahead. The two things I had in mind when I made
this were "bp/" and "/usr/local/lib/bp/". The first would have all the bp
files in a subdirectory of the directory bp.pl was in. The second would
only clutter /usr/lib/perl with one file, but isn't recommended, since an
even cleaner solution is to set your BPHOME directory to wherever you put
all the bp files. Personally, I just leave them in my source directory
and set BPHOME to point to them.

Modules are searched for by using require, so they can be found anywhere in
your @INC (properly written bp programs will add BPHOME to your @INC as the
first thing they do). The rules for module names (not enforced currently)
are that they should not be named 'p-...', 's-...', or 'cs-...' because
those are reserved for package, style, and character sets, respectively.
Names like "xxx2yyy" are reserved for converters. I'm trying to keep all
the files down to 8 characters (not counting the bpprefix) just in case
some people use crippled operating systems like DOS.

--- timing commands used here are
time perl5 tconv.pl -convert -format=refer,bibtex ad425.ref > foo
time perl5 tconv.pl -convert -format=bibtex,refer foo > foo.ref
time perl5 tconv.pl ad425.ref > foo2.ref
---

version notes:

0.2.2 perl5/ad425.ref->ad425.bib in 20.0 seconds
perl5/timetest.pl 2.91s total, 0.92, 0.94, 3.11

Added fontcheck to validate font changes.
All new Medline module.
Changed options parsing.
Unimplemented and unsupported methods have new stdbib routines.
Modified name_to_canon routine.
Added EndNote support to Refer module.
Fixed nasty typo in bp.pl's tocanon.
(changed perl version to 5.002 and slight machine changes)

0.2.0 perl5/ad425.ref->ad425.bib in 22.1 seconds
perl5/timetest.pl 2.95s total, 1.05, 1.00, 3.32

Changed BibTeX string parsing.
Minor cleanup to distribution.

0.1.4 perl5/ad425.ref->ad425.bib in 22.6 seconds
perl5/timetest.pl 2.89s total, 1.09, 1.02, 3.66

General cleanup.
Put refer tocanon code through optimizer and generally cleaned up.
All formats and character sets use the 8bit code.
New apple and ISO 8859-1 character sets.
Some changes to HTML output.

0.1.3 perl5/ad425.ref->ad425.bib in 30.7 seconds
perl5/timetest.pl 2.71s total, 1.07, 1.09, 3.63

Major changes to fonts. We use 8bit internally with one escape char.
Changed to use $cs_sep instead of $; everywhere.
Names are seperated with $cs_sep and parts by $cs_sep2.

0.1.2 perl5/ad425.ref->ad425.bib in 25.0 seconds
perl5/timetest.pl 2.36s total, 1.10, 1.10, 3.62

Changed glb_filename to glb_Ifilename in bp-p-errors.pl.
We open >>- instead of >- by default.
bp-bibtex.pl sets glb_vloc to the line number.
Changed format option parsing routine in bp-p-options.
added opt_reverseauthor to bp-refer, and added options parsing.

0.1.1 perl5/ad425.ref->ad425.bib in 25.3 seconds
perl5/timetest.pl 2.31s total, 1.03, 1.01, 3.52

Added p-output.pl. HTML output calls this. Minor changes to other
programs. Added example (lousy) Melvyl reader.

0.1.0 Added html charset and output format.
XXX check out RFC 1345 on charsets. It would be nice to let us read
these descriptions in. That may be another project though.

0.0.13 perl5/ad425.ref->ad425.bib in 24.9 seconds
perl5/timetest.pl 2.07s total, 1.12, 1.14, 3.53

Added new formats (ieee, medline, tib). Fleshed out TeX charset
conversion including changes to fonts. Minor changes in library.
New programs count, sort, and rdup.

0.0.12 perl5/ad425.ref->ad425.bib in 22.9 seconds
perl5/timetest.pl 1.94s total, 1.20, 1.22, 3.68

MAJOR changes in the way files are handled internally. Input and
Output files are now stored in seperate locations. There is no
openwrite function any more -- open is called with a prefixed mode,
just like the system open, i.e. "open('foo')" vs. "open('>foo')".
The system filehandles are not supported -- use '-' for STDIN or
STDOUT.

Changed the way the auto format handles character sets (and formats to
a lesser degree). Debugging dumps with glb_debug==2. More dump info
printed. bp_find_files supports 'rehash' to re-read directories.

Finally removed the global rfmt and rcset info. Use the specific
data for the global input and output files, which change each time
an I/O operation is done, rather than when format is called.

Tried stripping all the comments and debugging calls from the main
package, and stuffing everything into one file. (About half of the
code size is comments). Package loading time went down maybe .1 - .2
seconds (base of .75). No noticible change in running time, which
means the debugging statements aren't slowing us down enough to worry
about dropping them.

0.0.11 perl5/ad425.ref->ad425.bib in 22.7 seconds
perl5/timetest.pl 1.98 total, 1.02, 1.01, 3.39

Added format registration. Key generation and registration added to
bp_util, and the _output_ format is responsible for calling it if it
wants a key. This means we don't have to be generating keys if we
don't need them.

changes to TeX protection. Unrolled cs-tex tocanon map_simp. We
now only call parse_format in non-time-critical routines like format
and open.

0.0.10 perl5/ad425.ref->ad425.bib in 23.7 seconds
perl5/timetest.pl 1.89s total, 0.93, 0.95, 3.33

Revamped option and stdarg parsing. Added glb_in/out cset so we
make fewer calls to parse_format, which is why our time went back
down. Also changed debugging levels a bit, to reserve 0 and 1 for
true and false.

Split into modules, as the bp.pl file got over 40k. I should
probably split it even further.

0.0.9 perl5/ad425.ref->ad425.bib in 25.3 seconds
perl5/timetest.pl 1.89s total, 0.96, 0.96, 3.38

major changes to font parsing. Time suffers because we now protect
TeX characters, so we have to do that for _every_ string we get,
instead of doing a quick '<|>' check to skip it. That one change
makes the 2608 record time go from 127 seconds to 140 seconds.

Lots of changes to BibTeX and Refer parsers also, but little to the
main package.

0.0.8 perl5/ad425.ref->ad425.bib in 23.5 seconds
perl5/timetest.pl 1.59s total, 0.94, 0.91, 3.35

0.0.7 perl5/ad425.ref->ad425.bib in 28.6 seconds
perl5/timetest.pl 1.36s total, 0.92, 0.91, 3.11

passes tests.

0.0.5 perl5/ad425.ref->ad425.bib in 55.6 seconds

time perl5 tconv.pl -canon -informat=refer -outformat=bibtex ad425.ref >f.bib
time perl5 tconv.pl -canon -informat=bibtex -outformat=refer f.bib > f.ref
time perl5 tconv.pl ad425.ref > fo.ref

diff fo.ref f.ref

Optimization issues:
> One of the hardest issues I've had to deal with is the slowness of
function calls in perl. A null procedure call is about 150 - 700
times slower than C, depending on the platform, and whether you've
compiled the perl code (yes, even compiled perl code still does
function calls very slow as of perl compiler a3). This means there
is a tradeoff between modular code and speed. In 0.3.0, setting
options once per record adds about 10% to the run time! This is one
reason I put something like "&debugs(message) if $glb_debug" -- the
test for $glb_debug saves a _lot_ of time.

Issues to deal with:
> Anything with an XXXXX

> Look into the possibility of using references for our associative
array calls. This might save us some time, but opens up some interesting
problems as well. In 0.0.10, it made the time go from 24.6 to 23.9
seconds to change the calls in to and from canon. It should be safe
there, as we're passing in a reference to our local copy of the users
data, but I'd want to do some more tests to make sure.

> Do we force them to call &format, or set a default? Right now, we
set a default. If we remove the default, then we must check in
open that the format is loaded. And hope they don't call read without
opening a file...

> Right now we do:
tocanon: ics/ifm -> ccs/ifm -> ccs/cfm
fromcanon: ccs/cfm -> ccs/ofm -> ocs/ofm

Consider:
tocanon: ics/ifm -> ics/cfm -> ccs/cfm
fromcanon: ccs/cfm -> ocs/cfm -> ocs/ofm

where cs=character set, fm=format, i=input, o=output, c=canonical.

The advantage of our current system is that the format routines are
independent of the character set used. This is a big plus when we
want to program tib, which can then reuse most of the refer routines.
The disadvantage is that some formats can be better written if they
can see their special characters.
My feeling as of 0.2.2 is that the current system is what we ought
to do.

> Complexity option, standardized. We should have some sort of scale:
0 Stupid as can be
10 Simple -- assume best case
100 Optimistic -- formatted correctly, but complicated
1000 Pessimistic -- Try to handle anything thrown at you
Maybe a scale that can be returned which has an indication of how
complex of stuff can be handled. This might be useful when deciding
whether to make to in->canon or the canon->out converter use
heuristics.

> Character sets are second-class. For instance, we allow special
converters for formats, but not for charsets. We don't allow options
to be passed to them. Setting the correct charset is flakey.
(version 0.0.12 looks like it improved internal handling, but they're
still 2nd class)

> Check for variable suicide. We use fmt and cset a lot.

> Add a new function, 'formatfunction' that calls a special function
for that format. Some formats may wish to export functions that
might be useful, such as validation checkers.

> What if we have a special converter, but no information on the formats
involved? Could we somehow set that up so we could just call convert
and be happy?

>