사전은 언어를 서술한다, 그리고 이 문건에서 우리는 오메가위키의 언어개념을 서술한다. 오메가위키는 그개념을 ISO-639-3표준에 기초한다. 약간의 변종이 있다. 자료기지에 언어 테이블이 있다.
OmegaWiki의 언어 개념
SIL has this to say about the scope of language identifiers for individual languages:
- "There is no one definition of "language" that is agreed upon by all and appropriate for all purposes. As a result, there can be disagreement, even among speakers or linguistic experts, as to whether two varieties represent dialects of a single language or two distinct languages. For this part of ISO 639, judgments regarding when two varieties are considered to be the same or different languages are based on a number of factors, including linguistic similarity, intelligibility, a common literature, the views of speakers concerning the relationship between language and identity, and other factors."
The ISO 639-3 standard also contains codes for macrolanguages, that is:
- "...clusters of closely-related language varieties that, based on the criteria discussed above, can be considered distinct individual languages, yet in certain usage contexts a single language identity for all is needed."
The ISO 639-3 standard explicitly declines to encode dialects, where "...the term dialect is used as in the field of linguistics where it simply identifies any sub-variety of a language such as might be based on geographic region, age, gender, social class, time period, or the like." Thus it has one language code "eng" (English) which covers usage in the UK, USA, and some 104 other countries, including dozens of dialects from Cockney to Black English.
However, OmegaWiki finds it important to record distinctions between some local dialects. For example, it has separate language codes for "English", "English (United Kingdom)", and "English (United States)". Decisions to distinguish dialects in OmegaWiki by adding new language codes are made by the project leadership on a case-by-case basis.
오메가위키는 ISO 639-6로의 천이하는 경향이 있다. 그 목록이 언어 개체를 위한 목록이기 때문이다. ISO 639-6의 목적은 "언어 변종의 이해쉬운 coverage"를 위한 코드를 얻는 것이다. 아마도 이것은 ISO 639-3 내의 등록된 모든것을 위한 코드를 포함한다, 그리고 ISO 639-3 범위 바깥의 방언과 문장도 포함한다. ISO 639-6 은 2007년 6월의 초안이다. 그리하여 아마도 여러 달 또는 아마도 여러해의 작업 후에 최종 표준이 되었다.
언어, 문자와 orthography
ISO 639-3's scope for language identifiers includes aspects of spoken expression ("intelligibility") and written expression ("literature"). It covers both with a single language identifier. However, OmegaWiki's fundamental element of Expression is built of written, not spoken text There are occasions where this distinction makes a difference.
The set of written marks, or graphemes, in which a language is written is known as a script. Multiple languages can be expressed using one script. For instance, the languages English and French are both written in the Latin script. However, some languages are written in multiple scripts. For instance, "azb" (South Azerbaijani) is written in Arabo-Persian script, Roman-based script and Cyrillic script. Logically, OmegaWiki will eventually contain Expressions of South Azerbaijani text in each of these scripts, presumably with the same Language code. ISO15924 defines four-letter codes for many scripts.
Scripts change over time. For instance, both Japanese and Chinese have reformed their standard character forms in the 1940's. Thus there are some words in both languages which had one Expression before the reform, and a different Expression after the reform, both linked to the same DefinedMeaning. This has similar consequences to OmegaWiki as historical spellings of English words.
Orthography is the set of rules for expressing a certain language in written form, including the choice of script and how that script is used. Orthography changes over time. For instance, in 1996 Germany changed the rules for writing German.
기록되지 않는 언어
Some languages, such as "ase" American Sign Language are neither spoken nor written. Although OmegaWiki is a dictionary of written expression, it is a goal eventually to catalog sign languages as well. Thus, some non-written languages are in scope for OmegaWiki.
Sutton SignWriting script provides a way to record sign languages in writing. However, Sutton SignWriting is not yet encoded in Unicode, so it is technically difficult to record it in the database. ISO 639-3 does have language codes for multiple sign languages.
Details of OmegaWiki's use of ISO-639-3
OmegaWiki will make use of the ISO-639-3 qaa through qtz codes that are reserved for local use that is possible within the standard for specific information. This helps us deal with expressions that have a "grammar" but are not necessarily a language used for general communication. ISO 639-3 says these codes are for "cases in which there is no suitable existing code".
For chemical codes like H2O we will use the mul or zxx code; this is a code for stuff that is not language specific. ISO 639-3 defines "mul" as indicating "many languages are used and it is not practical to specify all the appropriate language codes". ISO 639-3 defines "zxx" as for "a situation in which a language identifier is required by system definition, but the item being described does not actually contain linguistic content."
The Language table is a table in the database that contains the languages that Expressions can be added in. Without a record in this table, it is not possible to add words for that language or dialect. Adding a Language is different from adding Babel templates or Portals. Languages can only be added by bureaucrats.
오메가 위키상에 사용중인 언어 목록
- Editable languages is a page with a list of language names. It appears to be automatically updated every few days. The list takes the form of a link to a portal for the language name's ISO 639-3 3-letter code, and a link to the DefinedMeaning of the Expression for the name of that language in that language. However, the list is incomplete; for instance, it doesn't list English (United Kingdom) or English (United States).
- Omegawiki.org's statistics for "Number of Expressions per language" show a list of languages actually used in OmegaWiki, including English (United Kingdom) and English (United States).