Quantcast
Channel: foxglove – Foxglove
Viewing all articles
Browse latest Browse all 59

Counting the books

$
0
0

As a follow up to my previous rather excited posting I have finally got round to actually trying to count how many copies of each title selected for the ELTeC English collection various prestigious national libraries hold. Here’s how I have operationalised the need for some kind of metric approximating to the persistence or canonicity of a given title.

First I run a little XSLT script against the corpus to create a file full of lines like the following:

f @and @attr 1=1003 sinclair @attr 1=4 "modern flirtations"
set marcdump ENG18410.usmarc
show all

This means:

      • find records in which the author field contains “sinclair” and the title contains the words “modern” and “flirtations”.
      • send the output to a file called ENG18410.usmarc
      • display all the results from that query

Creating this query automagically is not without problems. Including words like “the” or punctuation like the question mark is ill advised. Some records include subtitles in their “titles” but most don’t. When the records do contain subtitles they may result in false hits: see further below.

Next I throw this at a z3950 server and go make myself a cup of tea while it chunters away. As noted in my previous posting, getting Z3950 access to a library in question is mostly just a matter of knowing the address of the server and its port, the name of a database, and sometimes (as with the British Library) also wheedling a login and password. The reason I use the recondite syntax above for my query input, and the reason that I accept the results in usmarc 21 format is … that’s what every z3950 server I have looked at so far promises to provide. Some have other exotic options for query or for output, but nothing else is universally guaranteed to work.

Returning with my cup of tea, I now have a bunch of inscrutable marc21 records tidily filed away. I wasted the best part of an evening yesterday trying but failing to find a simple online tool which would convert them into marcxml or indeed anything readable, but the best I could come up with was a perl utility called marcdump. Here’s the start of the output it gives me for ENG18410.usmarc

LDR 00535nam a2200181uu 4500
001 006812208
005 20100212180700.0
008 040420s1841 xx || 000 ||eng
019 u _aG11034382
040 _aUk
_cUk
082 04 _a823
100 1 _aSinclair, Catherine,
_d1800-1864.
245 10 _aModern flirtations :
_bor, A month at Harrowgate /
_cCatherine Sinclair. Vol. 1.
260 _a[S.l.] :
_b[s.n.],
_c1841.
336 _atext
_2rdacontent
337 _aunmediated
_2rdamedia
338 _avolume
_2rdacarrier
852 41 _aBritish Library
_bDSC
_jW5/2649

Exciting stuff, eh. The useful bit here is the publication date, which appears as subfield _c of field 260 here (sadly, there are other possibilities), and even more useful the following, which appears at the end of the output file:

Recs Errs Filename
----- ----- --------
4 0 ENG18410.usmarc

Tis but a matter of moments to grep through these files and extract a list of record counts for each title, together with a list of publication dates.

Furthermore, and much to my relief, the counts do seem to reflect my initial expectations as to which titles would be highly rated and which not. The top ten titles in my 90 are (drumroll)…

94 ENG18860 Hardy: The Mayor of Casterbridge
106 ENG18531 Yonge: The Heir of Redclyffe
135 ENG18621 Braddon : Lady Audley’s Secret
143 ENG18481 Dickens: Dombey and son
148 ENG18610 Eliot: Silas Marner
152 ENG18530 Dickens: Bleak House
157 ENG18540 Dickens : Hard Times
168 ENG18480 Thackeray: Vanity Fair
298 ENG18471 Bronte: Wuthering Heights
664 ENG18652 Carroll: Alice in Wonderland

Nearly all of these would figure on any list of long-lasting 19th c English novels. An eyebrow night be raised by some in the English department about the appearance of Yonge and Braddon, but the explanation is simple: both ladies (or their publishers) were very fond of including the phrase “by the author of ‘Most Famous Title’ ” on the title of their less famous works, and I have not yet worked out how to remove such imposters as “Work you’ve never heard of (by the author of Most Famous Title) ” from the results of a search for “Most Famous Title”.

Another eyebrow might be raised at the frequency distribution of the scores found: there is a very long tail, with nearly two-thirds of my 90 titles scoring 20 or less, while the top scorers, as shown above, score very much more. To some extent, this is explained by the crudity of my search technique, which will include musical adaptations, commentaries, versions for the use of slow readers, study notes, etc etc provided that “Most Famous Title” appears in the title somewhere. This worries me less, since the existence of such things is surely also testimony to the salience of the title in question. This factor does however have an inflationary effect on the scores, so that titles which don’t benefit from it appear lower than might be expected. “Middlemarch” for example – widely regarded as amongst the greatest English novels of the period, but not subject to – scores only 77, ahead of Sherlock Holmes debut novel “The sign of four” (72) but behind George Eliot’s closest rival for the depiction of provincial life Mrs Gaskell’s “Mary Barton” (82).

But these scores should not be subjected to such close scrutiny. If we are looking for a proxy metric for the “impact factor” of these works, it’s not implausible to be guided by the numbers of different editions of them that have accumulated in our great national libraries. If we say that a score of less than (say) 20 suggests a low impact, and anything above (say) 50 a high one, we should not go too far wrong.

So far I have tested this procedure only on the British Library’s collection. An obvious next step is to try a different English-language library (COPAC springs ro mind) to check that the ranking is not too widely different. And then to try out a different language: the BNF also has a z3950 server so I plan to subject the French collection to the same treatment.


Viewing all articles
Browse latest Browse all 59

Latest Images

Trending Articles





Latest Images