skaldic

Skaldic Poetry of the Scandinavian Middle Ages

Menu Search
8. Web interface help B. Unicode conversion of database

B. Unicode conversion of database

This is not currently part of the peer-reviewed material of the project. Do not cite as a research publication.

The database was converted to Unicode (UTF-8 encoding) on 24 August 2009. This has caused some changes to the way the database works.

It is recommended that you install a MUFI-compliant Unicode font such as Junicode or Andron Scriptor Web. Assistants on the project should contact Tarrin to obtain the font used for quality control (Adobe Garamond Pro).

Notes:

  • Text is very unlikely to have been lost during these changes. If something appears to have disappeared, it is probably because the text is not displaying properly rather than the text having been deleted. In any case, everything has been backed up and can be recovered.
  • The database now uses modern Icelandic as the language collation, that is, the system for searching and ordering text. This means that vowels with accents are no longer treated the same as vowels without accents when searching and ordering text. In alphabetical order, for example, 'Auðunn' comes before 'Ámundi'. A search on 'armoðr' no longer matches 'Ármóðr'. You can use the underscore character in searches if you can't produce the correct letter, e.g. '_mundi' matches 'Ámundi'. The search fields and some browsing options (e.g. browsing by first letter) reflect these changes — buttons are now provided to insert the non-English characters. For searching and ordering, æ and œ are equivalent; ø and ö are equivalent; and o and ǫ are equivalent for searching, and in some cases the alphabetic ordering is adjusted so that ǫ is equivalent to ø/ö.
  • ReykholtTimes is no longer used as a font internally in the database. You can continue to use ReykholtTimes and documents produced in ReykholtTimes to enter and edit data in the database. The database converts such fields to Unicode for storage, and converts them back again for editing. If you want to enter or edit a verse using Unicode instead of ReykholtTimes, you can do so by setting an option in the preferences form. Some other fields using ReykholtTimes can be entered using alternative forms using Unicode.
  • The old quality control (‘print-friendly’) format has been replaced with the format tabled at the meeting in Uppsala. This uses the book font and encoding, and is much closer to the final product than the old QC format, making QC more reliable. If you have any problems with the format, let Tarrin know (but please install the Garamond Pro font first!).
  • The character o-hook-acute is not handled consistently. This character is not in the Unicode standard as a simple letter — it can be formed by joining o-hook and a combining acute accent, or o-acute with a combining hook accent; and the MUFI project defines an encoding for this character in the 'Private Use Area', but this does not work with non-MUFI fonts (i.e. almost all pre-installed fonts). O-hook-acute can therefore be displayed correctly, but searches and ordering of information may become unreliable. In most fields it is encoded as ô; in some fields formerly using ReykholtTimes it is encoded as o-hook-macron, and others as a custom character. Until this issue is resolved, this character may cause problems for display. However, the character is stored unambiguously if not consistently and no data will be lost or corrupted. If you need to insert or edit this character, use o-circumflex (ô and Ô) to represent the character for the time being. 



Other notes...

Known bugs and things to do:
  • still uncertain about how to treat o-hook-acute (in some fields this is encoded as o-circumflex, in former reykholttimes fields, it's o-ogonek-macron)
  • prose order and translation forms are not fully generalised to use the levels-definition fields (to determine auto-filled text, reordering, etc.)
Log of changes:
  • 18-21/6/09: preparations: conversion scripts, updates to database
  • 24/6/09: backed up database; converted database to unicode; started changing web forms and scripts
  • 25/6/09: fixed various problems; backed up database again; converted õ to o-hook in all text fields
Detail of changes:
  • Auto conversions: run convert2utf8.php to do conversions (see comments); run unicodeencode.php to convert reykholt fields to utf8
  • add to db-vals: $encoding = 'utf8'; $collation= 'utf8_icelandic_ci';
  • add to lib-db, lib-db-edit: mysql_query("SET
  •   character_set_results    = '$encoding',....");
  • add new rt-uni functions to lib-trans.php
  • add new lines to lib-db-edit.php (see bits marked with #----)
  • update everything in lib/php/view: Windows-1252 > 'UTF-8' [perl -pi~ -e 's/Windows-1252/UTF-8/;' *.php; rm *~]
  • lib/php/view/lib-verses-app.php CHAR(171) > UNHEX('CBA3')
  • php_query - 39 '<sup>x</sup>' > UNHEX('CBA3')
  • *** reykholt/unicode possibilities for extended verse editing forms ***
  • *** what to do with 8-bit encoding õ and ô ??? ***
  • change queries: SELECT id FROM php_view WHERE body_query LIKE '%convert%', php_query.sql, etc. -- ???
  • ReykholtTimes fields: app.corr_from, app.note, app.reading, notes.note, poems.introduction, refs.transcription, skalds.biography, verses.context, verses.editions, verses.intro, verses.lg_alt, verses.run_rdg, verses.run_rdg_notes, verses.skjatext (see //plato/var/local/find_ents_in_reykholt_fields.pl)
  • ReykholtTimes forms: verses...
  • ReykholtTimes conversion function for forms which use RT
  • --- notes ---
  • Reykholt fields only use entities outside the Latin-1 range (assumption, but checked)
  • Reykholt fields duplicated as *_reykholt

References

Close

Log in

This service is only available to members of the relevant projects, and to purchasers of the skaldic volumes published by Brepols.
This service uses cookies. By logging in you agree to the use of cookies on your browser.

Close