Author Topic: Skipped events page  (Read 4161 times)

Janet Jaguar

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 10166
  • Smell the sea, feel the sky, & fly into the mystic
    • View Profile
Re: Skipped events page
« Reply #15 on: February 13, 2013, 10:19:55 am »
Quote
I don't know when it is effectively too late to adjust your entry. Some notification of that would be nice - just in case someone still wishes to check their entries.  I often have to 'finish' a page part way through because I squeeze in some transcribing during lunch break.

As long as a ship is less than 100% complete, all edits and additions will be included in the final data gather.  You can proof read your whole voyage if you are feeling a bit OCD, and all changes will be used.

Once a 100% ship goes VAL, Philip will consider the data ready for analysis.  Any changes made after that point are a gamble; if you get in before the analysis team, your changes are kept.  If they have already harvested the data, I doubt they will do a second harvest and a full analysis for a second try.

Kathy

  • Editor
  • Hero Member
  • *****
  • Posts: 2761
  • In the beginning was the sea...
    • View Profile
Re: Skipped events page
« Reply #16 on: February 13, 2013, 01:24:06 pm »
There really is no way to write a computer program to check the number of entries and, based on that number, tell if a page is incomplete.  For example, there may be 24 weather entries on 5 pages for a ship, and then suddenly, on 1 page, there are 12 for the morning and only 4 for the afternoon - that page would probably be kicked out as incomplete, but was in fact transcribed completely.  There are multiple examples of more or fewer entries on log pages for just the weather alone that would just give any computer drive failure trying to figure everything out.  ;D

okopho

  • Jr. Member
  • **
  • Posts: 54
    • View Profile
Re: Skipped events page
« Reply #17 on: February 13, 2013, 08:01:34 pm »
There really is no way to write a computer program to check the number of entries and, based on that number, tell if a page is incomplete.  For example, there may be 24 weather entries on 5 pages for a ship, and then suddenly, on 1 page, there are 12 for the morning and only 4 for the afternoon - that page would probably be kicked out as incomplete, but was in fact transcribed completely.  There are multiple examples of more or fewer entries on log pages for just the weather alone that would just give any computer drive failure trying to figure everything out.  ;D

Hmm. But if that were completely true, then surely crowdsourcing wouldn't work at all? :D

The beauty of crowdsourcing is that the computer program doesn't need to know the precise content of a page in order to assess whether it's "complete". Everything works by consensus. So if the large majority of transcribers agree that there are about 30 entries on a page, then a page that is submitted with significantly less entries could be counted as "incomplete".

But I would guess there are no hard and fast rules as to whether it actually should treated that way. Every project is different, and I've no doubt there are many diiferent variables that need to be juggled with in order to find the most effective all-round compromise.

However, that is not what I was getting at when I suggested that the current behaviour of the Finished Button might be "fragile". As others have pointed out (and I alluded to myself), it works just fine when people use it in the way that was intended. Potentially, though, it does seem open to abuse by a hypothetical malicious user who could just click through pages whilst transcribing little or nothing. But probably that possibility has already been thought of, and there are already measures in place to deal with it.

Kathy

  • Editor
  • Hero Member
  • *****
  • Posts: 2761
  • In the beginning was the sea...
    • View Profile
Re: Skipped events page
« Reply #18 on: February 13, 2013, 08:51:14 pm »
ok - (now this is not to be construed as anything but musing about the programming -  ;D)

I think the QC process of 3 people transcribing a page protects against the malicious user - you are right, so far as I know, based on other conversations here in the Forum, the number of entries on the page don't matter - the program just compares transcriptions (based on the hour field) and keeps the transcription for the hour in question if two or more of the entries match.  If there is not agreement between at least 2 of the entries, then that hour is discarded.  Since the odds of a page being sabotaged by 2 people is probably slim, then there should be agreement between the other 2 transcriptions and the hour is accepted.  If none of the transcriptions on the page match, then that page is kicked out and I think Phillip looks at it - our accuracy rate is very good - when the project first started, pages had to be transcribed by 5 people before they were considered processed.

Other crowdsourcing projects may work differently - I don't know what is the QC standard for other Zooniverse project is - but because of the way QC works for Old Weather, consensus on the number of weather entries on a page doesn't matter - and there is no QC for the Events page - right now, they are cleaned up in a separate editing process run by Gordon Smith.

Anyway, thanks for the conversation - I'm a database geek - I love this sort of thing -  ;D

okopho

  • Jr. Member
  • **
  • Posts: 54
    • View Profile
Re: Skipped events page
« Reply #19 on: February 13, 2013, 09:39:32 pm »
Other crowdsourcing projects may work differently - I don't know what is the QC standard for other Zooniverse project is

I think the thresholds vary quite a lot. It's 5 on Serengeti Snapshot, but I think I read that it's something like 85 on Planet Hunters!

Quote
I'm a database geek - I love this sort of thing -  ;D

Yes, I do quite a lot of programming myself, so I find it hard to curb my curiosity about how things work. ;D

Kathy

  • Editor
  • Hero Member
  • *****
  • Posts: 2761
  • In the beginning was the sea...
    • View Profile
Re: Skipped events page
« Reply #20 on: February 13, 2013, 09:43:12 pm »
 ;D ;)

Janet Jaguar

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 10166
  • Smell the sea, feel the sky, & fly into the mystic
    • View Profile
Re: Skipped events page
« Reply #21 on: February 13, 2013, 11:10:04 pm »
This project started with 5 per page, and was reduced to 3 when we proved to be extraordinarily accurate.  Serengeti Snapshot uses between 5 and 8 on each picture, depending on how much disagreement there is - but they are working only on a multiple-guess basis.  Totally different than our transcribing.

Better than the Defence. is a blog Philip wrote explaining the statistics behind the decision to reduce to 3. 

A much newer re-examination of the issue is in this post: http://forum.oldweather.org/index.php?topic=5.msg55684#msg55684

And this is a rough draft of a published paper he wrote on the whole process:  OldWeather.org: citizen science for climate reconstruction