@Boinc developers: Strange effect on 2-CPU machine / Linux

Anonymous
Topic 12964

EXECUTIVE SUMMARY: The 4.19 client has known bugs when working through a proxy server. Please consider trying a more recent BOINC client.

http://einstein.phys.uwm.edu//host/212379/tasks

But 3 hours later it requested another one: 1003925. And it requested only one? The second one of the new pair was requested in the morning.

You mean 7 hours later, right?

I checked a little bit more and found out that the workunit 1003401 did not reach my machine.

I am studying the scheduler contact for this lost WU. It's very helpful that your machine has NTP accurate timestamps!

I think your proxy timeout is fine. Here's the relevant part of the log:

192.168.45.243 - - [10/May/2005:20:53:33 +0200] "POST http://einstein.phys.uwm.edu:80/EinsteinAtHome_cgi/cgi HTTP/1.0" 200 8804
192.168.45.243 - - [10/May/2005:20:54:34 +0200] "POST http://einstein.phys.uwm.edu:80/EinsteinAtHome_cgi/cgi HTTP/1.0" 200 8804

The first of these is the lost WU. What do the 200 and 8804 refer to?

Here is the relevant bit of the scheduler log:

2005-05-10 18:53:33 [PID=22527] [debug ] REQUEST_METHOD=POST CONTENT_TYPE=application/octet-stream HTTP_ACCEPT= HTTP_USER_AGENT=
2005-05-10 18:53:33 [PID=22527] [debug ] CONTENT_LENGTH=3645 from X.X.X.X
2005-05-10 18:53:33 [PID=22527] [normal ] Handling request: IP X.X.X.X auth XXXX, host 212379, platform i686-pc-linux-gnu, version 4.19
2005-05-10 18:53:33 [PID=22527] [normal ] OS version Linux 2.4.19-64GB-SMP
2005-05-10 18:53:33 [PID=22527] [debug ] Request [HOST#212379] Database [HOST#212379] Request [RPC#2] Database [RPC#0]
2005-05-10 18:53:33 [PID=22527] [normal ] Processing request from [USER#12758] [HOST#212379] [IP 84.245.146.10] [RPC#2] core client version 4.19
2005-05-10 18:53:33 [PID=22527] [normal ] [HOST#212379] got request for 17280.369637 seconds of work; available disk 11.155923 GB
2005-05-10 18:53:33 [PID=22527] [debug ] [HOST#212379]: has file H1_0688.5
2005-05-10 18:53:33 [PID=22527] [debug ] send_old_work() no feasible result older than 168.0 hours
2005-05-10 18:53:33 [PID=22527] [debug ] in_send_results_for_file(H1_0688.5, 0) prev_result.id=4092185
2005-05-10 18:53:33 [PID=22527] [debug ] est cpu dur 43116.118341; running_frac 1.000000; rsf 1.000000; est 43116.118341
2005-05-10 18:53:33 [PID=22527] [debug ] Sorted list of URLs follows [host timezone: UTC-25200]
2005-05-10 18:53:33 [PID=22527] [debug ] zone=-21600 url=http://einstein.phys.uwm.edu
2005-05-10 18:53:33 [PID=22527] [debug ] zone=-18000 url=http://einstein.aset.psu.edu
2005-05-10 18:53:33 [PID=22527] [debug ] zone=-18000 url=http://morel.mit.edu
2005-05-10 18:53:33 [PID=22527] [debug ] zone=+00000 url=http://einstein.astro.gla.ac.uk
2005-05-10 18:53:33 [PID=22527] [debug ] zone=+03600 url=http://einstein.aei.mpg.de
2005-05-10 18:53:33 [PID=22527] [debug ] [HOST#212379] Sending app_version einstein i686-pc-linux-gnu 480
2005-05-10 18:53:33 [PID=22527] [debug ] [HOST#212379] Already has file H1_0688.5
2005-05-10 18:53:33 [PID=22527] [debug ] [HOST#212379] reducing disk needed for WU by 14736000 bytes (length of H1_0688.5)
2005-05-10 18:53:33 [PID=22527] [debug ] est cpu dur 43116.118341; running_frac 1.000000; rsf 1.000000; est 43116.118341
2005-05-10 18:53:33 [PID=22527] [normal ] [HOST#212379] Sending [RESULT#4098898 H1_0688.5__0688.9_0.1_T04_Fin1_2] (fills 43116.12 seconds)
2005-05-10 18:53:33 [PID=22527] [normal ] [HOST#212379] Sent 1 results [scheduler ran 0 seconds]

The problem may be a simple one to fix. The 4.19 client has some proxy-related bugs. You might try replacing it with something more recent.

Note that looking at this I noticed another bug (RPC # remaining zero in the database). I think I know where this comes from -- it's unrelated to the lost WU -- I'm going to try and track this down with David Anderson later today.

Cheers,
Bruce