User talk:Kipcool/stats

From OmegaWiki

Jump to: navigation, search

Contents

[edit] scripts

[edit] list of expressions

[edit] sql query, fast

USE omegawiki
SELECT  spelling, language_id
FROM uw_expression_ns
WHERE expression_id IN
(
SELECT DISTINCT expression_id
FROM uw_syntrans
WHERE remove_transaction_id IS NULL
) ;

remark: the number of expressions obtained that way is one for each language: the word "toto" will appear twice if it exists in 2 languages, but only once if it has 2 definitions in one language. In the stats http://www.omegawiki.org/extensions/Wikidata/util/stats.php , remove_transaction_id is not taken into account, therefore displaying a wrong higher number of expressions:

USE omegawiki
SELECT  spelling, language_id
FROM uw_expression_ns ;

[edit] http queries, slow

Note: It stops if it falls on an expression with a "[". In that case, you have to jump it manually.

#!/usr/bin/perl

use LWP::Simple

open (OUTPUT, ">WZlist.txt") ;
# 1 : get the list of pages

$continue = 1 ;

$starturl = 'http://www.OmegaWiki.org/index.php?title=Special%3AAllpages&from=%21&namespace=16&columns=3' ;

print "url de depart : $starturl\n" ;

$lapage = get $starturl ;
die "Couldn't get $starturl" unless defined $lapage;

# print $lapage ;

while ( $continue == 1 ) {

  $allarticle = "" ;
  while ($lapage =~ m/title="(OmegaWiki:.*?)">/g) {
    $allarticle .= "http://www.OmegaWiki.org/$1\n" ;
  }

  print OUTPUT $allarticle ;

  if ( $lapage =~ /(title=Special:Allpages&from=[^&]*&namespace=16&columns=3)" title="Special:Allpages">Next page/ ) {
    $nexturl = $1 ;
    $nexturl =~ s/&/&/g ;
    $nexturl =~ s/:/%3A/g ;
    $nexturl = 'http://www.OmegaWiki.org/index.php?'.$nexturl ;
    print "$nexturl\n" ;

    $lapage = get $nexturl ;
# print $lapage ;
    die "Couldn't get $nexturl" unless defined $lapage;
  } else {
    $continue = 0 ;
  }
}

close (OUTPUT) ;

[edit] list of DMs

[edit] sql query

USE omegawiki
SELECT DISTINCT defined_meaning_id
FROM uw_defined_meaning
WHERE remove_transaction_id IS NULL ;


[edit] http queries = slow

#!/usr/bin/perl

use LWP::Simple

open (OUTPUT, ">DMlist.txt") ;
# 1 : get the list of pages

$continue = 1 ;

$starturl = 'http://www.OmegaWiki.org/index.php?title=Special%3AAllpages&from=%21&namespace=24&columns=3' ;

print "url de depart : $starturl\n" ;

$lapage = get $starturl ;
die "Couldn't get $starturl" unless defined $lapage;

# print $lapage ;

while ( $continue == 1 ) {

  $allarticle = "" ;
  while ($lapage =~ m/title="(DefinedMeaning:.*?)">/g) {
    $allarticle .= "http://www.OmegaWiki.org/$1\n" ;
  }

  print OUTPUT $allarticle ;

  if ( $lapage =~ /(title=Special:Allpages&from=[^&]*&namespace=24&columns=3)" title="Special:Allpages">Next page/ ) {
    $nexturl = $1 ;
    $nexturl =~ s/&/&/g ;
    $nexturl =~ s/:/%3A/g ;
    $nexturl = 'http://www.OmegaWiki.org/index.php?'.$nexturl ;
    print "$nexturl\n" ;

    $lapage = get $nexturl ;
# print $lapage ;
    die "Couldn't get $nexturl" unless defined $lapage;
  } else {
    $continue = 0 ;
  }
}

close (OUTPUT) ;

[edit] DM's

The number of DM's on Google is also about the same as your count (even a bit higher)

date #Expression #DefinedMeanings Yahoo: [1], Google EXPR: [2] Google DM: [3]
23 jan 2007 179'890 12'470 21'700 172'000 12'600

HenkvD 15:07, 23 January 2007 (EST)

[edit] Updating?

I found that when I was actively adding Khmer synonyms and definitions that seeing the numbers mount up in your statistics was very gratifying. I suppose I could figure out how to run the queries posted here, but that's no real replacement for checking your page weekly. Won't you resume running (and posting) these statistics? Rsperberg 08:54, 11 June 2008 (EDT)

I saw you message only today... I will see what I can do, but from what I heard, the dump is broken at the moment. Kipcool 12:32, 3 July 2008 (EDT)
dump repaired, stats updated. Kipcool 14:58, 8 July 2008 (EDT)
Personal tools
Toolbox