Author Topic: Suggestions for Phase 4  (Read 30449 times)

Craig

  • Shipherd
  • Hero Member
  • *****
  • Posts: 3360
    • View Profile
Re: Suggestions for Phase 4
« Reply #15 on: September 07, 2013, 05:57:17 pm »
To get an idea if prefilling the current hour with previous hour data is worthwhile, I compiled the number of changes for the following six ships. Of course, this is just one day out of each ship's log so it is not representative. However, it does give some idea of the variety one might expect:

                    --------------------------------------------- number of changes in 24 hours---------------------------
Hour |Wind dir |Force |Height |Ther Att. |Air |Bulb |Water |Wthr Code |Cloud code |Clear Sky |Average changes per row in 24 hrs |Number of  filled columns |% of cols changed |
Jeannette92553334401.61016%
Concord41099810104443.21032%
Pioneer9119161242.8740%
Charleston4918101213646144.11041%
Yorktown13912111315711184.6951%
Thetis8112314191905124.7952%

(sorry about the formatting. I don't know how to wrap text. Actually, I do but I was too lazy  ;D)

You can see that the highest average percentage of cells that change is 52%.  However, even having to change half the cells in a row is faster than entering all of them. Note that I didn't count the change from a value to a ditto as a real change. Before Philip changed the software to accept dittos we entered values so perhaps this could be another TWYS exception.                                              
« Last Edit: September 07, 2013, 06:19:12 pm by Craig »

Janet Jaguar

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 10205
  • Smell the sea, feel the sky, & fly into the mystic
    • View Profile
Re: Suggestions for Phase 4
« Reply #16 on: September 07, 2013, 07:54:13 pm »
How do you guard against the very real (as in I've done it to myself when trying that repeat-and-change in forms at work) chance that when 9 of the 10 readings are duplicated, you'll miss/forget to change the odd one out?

When training, I'm aware of having to teach safety against 3 kinds of errors.
  • Beginner's errors all come from ignorance, and disappear when the operator's shiny newness gets rubbed off by experience.
  • Human errors comes from the inherent psychology of being born, and are a risk to every operator always.
  • Old timer's errors all come from being overly confidant, or from the inherent blindness we all have to unexpected changes around us.  Beginner's don't want to learn these safeguards because every looks so fresh, they don't believe things will be missed.

This forgetting the one that needs changing is definitely an old timer's error.  It's related to highway hypnosis - you KNOW what is supposed to be out there so you stop really looking.

I'm thinking to avoid this, you would have to fill in your changed data, and THEN signal you want the rest copied automatically.

Craig

  • Shipherd
  • Hero Member
  • *****
  • Posts: 3360
    • View Profile
Re: Suggestions for Phase 4
« Reply #17 on: September 07, 2013, 09:10:03 pm »
We have three transcribers, in principle. The probablility that they all make the same mistake is very low.

The filled data could be in a different colour or font size so that it is obvious which ones you have modified. That might help break the hypnosis effect.

Your solution could work too, Janet. It might require some monitoring at first to see what works best and if the error rate is acceptable.

My impression is that this could be implemented easily. If so, a few of us could experiment with it.


Janet Jaguar

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 10205
  • Smell the sea, feel the sky, & fly into the mystic
    • View Profile
Re: Suggestions for Phase 4
« Reply #18 on: September 07, 2013, 09:14:20 pm »
The lovely thing is, this is all theoretical brainstorming.  Trying on odd methods of accomplishing what we want is free.  ;)

AvastMH

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 7328
    • View Profile
Re: Suggestions for Phase 4
« Reply #19 on: September 07, 2013, 11:04:50 pm »
I've stood well apart from these discussions so far since I resist changing from a purist approach of having to enter everything. However, we do need to be mindful of creative ways to speed up what we do..and to keep the curious so interested that they join and stay.
It seems to me that we are likely to have a lot of folk who part complete a page, close it and go - that page never gets a re-run for completion does it? Certainly when I was watching stats earlier in ph3 there were folk just completing 6 lines and giving up.
Given that inaccuracies can slip in whether you start with a page that's blank, or a page that requires amendment, the latter might be more tempting to stay with.
Please - any way that we could do a brief comparative study of either method?

There are only about 25 of us really active day to day now...with a planet to save.

Pommy Stuart

  • Shipherd
  • Hero Member
  • *****
  • Posts: 3595
  • A closed mouth gathers no foot.
    • View Profile
Re: Suggestions for Phase 4
« Reply #20 on: September 08, 2013, 12:33:54 am »
With apparently so few of us doing this now may be the time to YET AGAIN suggest only 6 obs per day. (Mid, 4, 8, Noon, 16, 20hrs)
Surely that is enough to get the trend and keep people in the program. One person (who shall remain naimless) has said to me that they are getting 'fed up' (or similar words) with the 24 / day obs.


Janet Jaguar

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 10205
  • Smell the sea, feel the sky, & fly into the mystic
    • View Profile
Re: Suggestions for Phase 4
« Reply #21 on: September 08, 2013, 12:49:27 am »
You always may suggest.  And the team is looking for ways to break pages up into watches, so transcribers can do much fewer at a time.

Until then:
http://oldweather.org/transcriptions/5225c2d8f277332d58003cbc/edit

Here's a perfect example of redundant information. If the ship had been in port there would be a report for each watch with exactly the same information that is in this table. Philip would have to assume that the wind direction was the same for each hour between the watches. In other words, he would assume dittos. So there is no point in me wasting time entering thousands of redundant dittos with corresponding hours when this is what  Philip will assume anyway in their absence.

Of course, if there had been blanks then this raises some doubt but there are very few blanks in the Jamestown's wind direction column. But even if there were blanks, what else could Philip assume but the wind direction that was last recorded?

What we are trying to do here is to extract all the weather information in the logs. This information then goes into the permanent marine climate database, and generations of future researchers will use it for all sorts of purposes. Each researcher will use a different subset of the observations, and make different assumptions about missing data.

When we have transcribed observations in the past, we've always left out information that was of low current priority, and we've always ended up regretting not entering what we left out. (I'm looking at the moment at some transcriptions made in the 1950s  - onto punched cards - which left out ship names, data in port, and rounded the positions to the nearest degree - these omissions made sense then, but they are a terrible nuisance now).

So for this reason, please transcribe all the weather observations - even those that look almost useless to us now. I'm betting that even the dittos will be vital to someone at some point.

Hi Janet -

Dittos are not the same as missing values (blank). If they are not entered reliably across the project then there is a serious risk of getting bad data where none should be (and this would be bad).

- Kevin
I do agree, by the way, that our current interface is rather inefficient for inputting observations that don't change (much) from hour to hour. (Yet another issue we never thought of when designing it). If this is going to occur often (probably in early logs) we need to do something about it.

The solution is not to skip observations, but to improve the interface. I'm not promising to make a change soon, because we have to be very careful about changing our operational system, but I'd be interested in suggestions for how to efficiently input such observations, without impeding input of ordinary obs. Or confusing novice users. Any ideas?

Remember also - it's not a race. It's fine to take a break. Maybe try another ship.

Janet Jaguar

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 10205
  • Smell the sea, feel the sky, & fly into the mystic
    • View Profile
Re: Suggestions for Phase 4
« Reply #22 on: September 08, 2013, 12:53:02 am »
And this is not the first time it has come up:
If there are for example 12 records per day , do i have to digitise the all ? Why ?
If the want to test and improve the alghorythm for the weather forecast ,you dont calculate the weather at for example 10 pm . You only calculate a trend .If i digitise 3 records (morning ,noon and evening ) ,would that not be enough for calculating the
average temperature for this day ?
Hi carbonat.

 Please do digitise them all.

 You're quite right, three measurements would generally be enough for calculating the daily average temperature, but that's not all we need to know - if we're worried about heat-stress we want the highest temperature of the day - if we're worried about frost-damage we might want the lowest, or we might want to know for how much of the day the temperature was below freezing. If a storm blows through the pressure will drop, and we need to know exactly when, or a day's continuous drizzle can produce exactly the same mean rainfall as a clear day with a single squall, and those are quite different weather types.

So all the observations will be used for something - you're not wasting your time inputting them all, no matter how boring they might look.

Philip

We keep everything - every box and character entered, nothing is deleted.

I then write software to use the entries to make climate (and history) records. This software evolves as I learn more about how best to use the data transcribed.

At the moment the software looks at each entry (every box on the interface) separately, and for the numeric values (date, position and weather), for each box it requires at least two people to have input the same contents.

So if three people all enter a noon weather observation, they all agree that the pressure is 30.02, two say the air temperature is 72 and one thinks it's 27, and one says the wind direction is NNW, one says NW and the third enters nothing, then the climate records produced will use a pressure of 30.02, a temperature of 72, but no wind direction (as no two people agreed on the wind).

If there are 24 weather entries on the page and all three people have entered all 24, it is easy to say which of person A's entries should be matched with which of person B's (and C's). If one has entered 24, one 12, and one 6, this matching becomes difficult, we will certainly fail to use some entries (as they will only have been entered by one person and we can't check their accuracy) and we may fail to use others (because person B entered the weather at 1pm but the software thought it was for noon and compared it with the wrong entry by person A).

There is endless opportunity for improvement in the processing software. I envisage a time, for example, where the software says 'Hmm, 2 72s and 1 27 - but 27 is much more likely a temperature given all the sea-ice mentioned on this page, I reckon someone's made a mistake and I'm going for 27'. Or 'only one pressure transcribed for noon today - 29.89 - but that one transcriber has a history of reliability and the value looks plausible given other values from nearby ships - I'm keeping it'. At the moment we're not clever enough for this, but we keep all the entries, future analysts will be cleverer.

See also this draft paper written by Philip:
OldWeather.org: citizen science for climate reconstruction

Pommy Stuart

  • Shipherd
  • Hero Member
  • *****
  • Posts: 3595
  • A closed mouth gathers no foot.
    • View Profile
Re: Suggestions for Phase 4
« Reply #23 on: September 08, 2013, 01:13:18 am »
Quote
'You always may suggest.  And the team is looking for ways to break pages up into watches, so transcribers can do much fewer at a time.'

Care would have to be taken with that approach as it would involve more downloads, longer to do a page and a possible continuity loss for those following the full course of the ships history (in the Misc pages).
All things have to be tried and I am willing to try anything on a trial basis, (but NOT in November).

I do think the post to carbonat.

Quote
Hi carbonat.

 Please do digitise them all.

 You're quite right, three measurements would generally be enough for calculating the daily average temperature, but that's not all we need to know - if we're worried about heat-stress we want the highest temperature of the day - if we're worried about frost-damage we might want the lowest, or we might want to know for how much of the day the temperature was below freezing. If a storm blows through the pressure will drop, and we need to know exactly when, or a day's continuous drizzle can produce exactly the same mean rainfall as a clear day with a single squall, and those are quite different weather types.

So all the observations will be used for something - you're not wasting your time inputting them all, no matter how boring they might look.

Philip
end quote
Should be made more easily seen by newbies, somehow?

I will say that 6 months of 24 obs in Shanghai is getting a but monotonous, but we do move out in a few day time. %^)
« Last Edit: September 08, 2013, 01:17:34 am by Pommy Stuart »

Janet Jaguar

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 10205
  • Smell the sea, feel the sky, & fly into the mystic
    • View Profile
Re: Suggestions for Phase 4
« Reply #24 on: September 08, 2013, 01:34:27 am »
You do very well Stuart.  :)

Pommy Stuart

  • Shipherd
  • Hero Member
  • *****
  • Posts: 3595
  • A closed mouth gathers no foot.
    • View Profile
Re: Suggestions for Phase 4
« Reply #25 on: September 08, 2013, 06:41:02 am »
Can we have the Google map working on the Page which shows the transcribers?
http://www.oldweather.org/ships/50874dcf09d4090755009992
Thanks.

Randi

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 13145
    • View Profile
Re: Suggestions for Phase 4
« Reply #26 on: September 08, 2013, 08:50:49 am »
...
I do think the post to carbonat.
...
Should be made more easily seen by newbies, somehow?
...

It is in Type What You See - Yes, but ... - General Comments.

Craig

  • Shipherd
  • Hero Member
  • *****
  • Posts: 3360
    • View Profile
Re: Suggestions for Phase 4
« Reply #27 on: September 09, 2013, 04:25:39 pm »
A different slant

It occurred to me that rotating the capture screen 90 degrees would be better than any of my previous suggestions regarding the copying of repeated values. In other words, rather than entering all variables for a single hour, you would enter all hours for a single variable. An advantage of this is that you always see the value in the context of its neigbours, making it less likely to make a mistake and also making it easier to guess the correct value if the writing is poor. But the main advantage is that it would be easier to take advantage of repeated values. If there was a long sequence of repeats you would only have to enter the 1 AM value and any changes below it. When this is done, you could click "Fill" and the missing cells would be filled in for that variable. Then you click OK and you are onto the next variable. In this approach, you would not have to enter the hours, which would be generated automatically, saving even more time. I can see that there could be some problems with this - such as alignment - but I don't think they are insurmountable. It would not be suited to all logs, either, but most of the more recent ones have hourly data. For this to work it wouldn't matter how many variables are recorded on the log page or what order they are in. And entering data vertically could avoid the need to display the entire table of transcriptions. My main reason for requesting this display was to be able to see our entries in context and detect mistakes. Of course, mistakes can still occur with this approach but our protection is always three transcribers doing the same log.

Many of our current transcribers are very proficient in keyboarding and are happy to continue with the current capture window. However, any real progress will come through attracting new transcribers, some of whom may not have great typing skills. Joan is probably right when she speculates that many are scared off by the large amount of data to enter and give up before completing a single page. Making it somewhat easier to transcribe might encourge some to continue.

I would be interested in hearing your thoughts on this proposal.

Randi

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 13145
    • View Profile
Re: Suggestions for Phase 4
« Reply #28 on: September 09, 2013, 05:14:12 pm »
That sounds promising!

Just a quick response for now ;)

I don't think fill all blanks would be a good idea because some may really be blank.
However, if there was a button to copy the entry from the previous row that would be helpful. Since that value will be already available in the 'entry box' it shouldn't be too hard to implement, and since we will be only doing one column at a time we won't be copying inappropriate values.

For the hour, if you split it into blocks of 12 hours, there could be a box where the transcriber selects AM or PM. That would handle double columns of data and nautical days.

Craig

  • Shipherd
  • Hero Member
  • *****
  • Posts: 3360
    • View Profile
Re: Suggestions for Phase 4
« Reply #29 on: September 09, 2013, 07:07:28 pm »
Good points, Randi. The "blank" problem occurred to me too. Of course, one could refrain from using the Fill function when this is the case. In most cases it would be just as easy to type a value than to press a button to copy a single cell. A message could appear when Fill option is used to warn a user that legitimate blanks could be filled by this action. The message could be deactivated by the user when he/she gets tired of seeing it.

Having two blocks would make sense also because of the distance between the top and bottom tables. And flagging nautical days would be a useful option.