Bug #35251

Sig checking has I/O spikes at high playercount

Added by dslyecxi almost 3 years ago. Updated over 2 years ago.

Status:Closed Start date:06/28/2012
Priority:Urgent Due date:
Assignee:- % Done:

0%

Category:Performance breakdown
Target version:1.62.95248
Affected ArmA II version:1.61 Beta First affected build:Please Specify...
Reproduced by another DH user:Yes First affected ArmA II version:
I am using some Mods:No Single / Multi Player?:MP Only
I am using:CO (OA+A2) BIForumURL:
Reproducible for you:No NGUrl:
Related to content of DLC: WIKIurl:

Description

Our server uses v2 signatures and has 1761 mod files that are signed. When at high playercounts (anything above 60ish, seems to become worse the higher the playercount), the server will undergo major I/O spikes that simultaneously drop server outbound bandwidth and cause large desync periods. Because of this, we have to keep sig checking off.

Attached is an image showing Process Explorer's I/O page at the time of two of these incidents.

Having this fixed would be a godsend.

signature_spikes.png (47.5 kB) dslyecxi, 06/28/2012 15:03

120714_sig_spike_condensed.png (12 kB) dslyecxi, 07/19/2012 21:14

ProcMon.png (200.9 kB) hofi02, 07/19/2012 22:04

ProcMon3.png (207.2 kB) hofi02, 07/19/2012 22:50


Related issues

related to ARMA2 Community Issue Tracker - Bug #38093: Server accessing paths of files as located on clients Closed 07/22/2012

History

Updated by Dwarden almost 3 years ago

  • Category set to Performance breakdown
  • Status changed from New to Assigned
  • Assignee set to Dwarden
  • Priority changed from Normal to Urgent
  • Target version set to 1.61 BETA
  • Operating system set to Windows Server 2008 R2 64 bit
  • Affected ArmA II version set to 1.61 Beta
  • Reproduced by another DH user changed from No to Yes

thanks for repeatedly testing this issue, was long wait for this CIT :)

Updated by Ander over 2 years ago

  • Difficulty changed from Not set to Veteran
  • I am using set to CO (OA+A2)
  • Single / Multi Player? set to MP Only

Can confirm this issue happens on pretty beefy systems (SSD raid0, dual OC:ed X5550).

FPS is good, then all of a sudden drops to 0 for each check.
The checks happen way too often and affects performance too greatly.

The disk reading i way too intensive on serverside.

Why not let BE handle checks? If BE is bypassed the cheater will still cheat.
Is there no way to calculate hashes once for the files?

The picture shows FPS drops to 0 after filling "one" server up with around 30 players":
http://img.ctrlv.in/50083470b1aad.jpg
FPS is stable when not using this integrity check and running 4x50 slot servers.

Updated by Suma over 2 years ago

  • Assignee changed from Dwarden to Suma

Updated by kju over 2 years ago

Well your server fps is pretty low. If this is due to the sig check or
by other reasons and to affect it negatively or has no relation at all,
is something to consider too.

With that said, I would assume the server (could) cache the hashes/required
data after doing it once and therefore no longer have the need for (heavy) I/O
activity - yet without knowing the signature system implementation that's just a poor guess.

Updated by Suma over 2 years ago

There is no reason for any substantial I/O activity on the server to check signatures. Server does not compute hashes, only verifies client computed hashes and signatures against a public key. Hash computation can be demanding on clients, but the whole process should be very lightweight on the server. Let us see if we are able to see anything abnormal.

Updated by hofi02 over 2 years ago

I can confirm this, running a server with DayZ and also have huge disk I/O spikes on high player count when players join the server, which causes server-wide desync sometimes.

Updated by dslyecxi over 2 years ago

Here is an update from us attempting to try signatures again. This behavior only happens if they are turned on, we never see an I/O spike like this without them. In this one you can see that it's doing 40MB+ read/sec.

Updated by Suma over 2 years ago

Can you try using ProcMon (http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx) to check which files is your server accessing? If you will see no named files access, it might be the disk traffic is page file, but I think it is more likely some game files are accessed and it would be great to know which.

Set the filter to include your DS process only, and watch for file operations.

When I have tried it in a simple scenario (10 clients, verifySignatures=2), I did not see any fps drops or I/O spikes. I will try to set a larger test later, but hopefully you might find something meanwhile.

Updated by hofi02 over 2 years ago

it is acessing many pbo files, not only the one shown on the screen
i moved some files to a ramdisk to reduce the effect a little bit

edit: i try to collect more data when more players join the server.

Updated by Suma over 2 years ago

hofi02 wrote:

it is acessing many pbo files, not only the one shown on the screen

I see two things which are unexpected to me:

- your server seems to be accessing chernarus_data_layers a lot. I did not see this on my server and I find it strange, as this pbo contains terrain textures only, which should not be needed by a server
- last entry visible in the monitor is a failed attempt to reach steam game installation on c: drive. This reminds me of an issue we are investigating with a little success so far, where server seems to be accessing paths somehoe provided by clients

Updated by hofi02 over 2 years ago

the failure is because i relocated the AddOns folder of Arma2 to a Ram drive, without ramdrive the impact is much harder.
I attached another screenshot.

Updated by Suma over 2 years ago

It would be great for me to see the spikes on my own server. It is not necessary for the spikes to be causing a serious issue, they just need to be reliably observable. I have tried 10 players on Chernarus and I did not see any indication of any I/O spikes at all.

I would like to know:
- how many players are needed to see the spikes? Are 70 neeed, or is 50 enough? What about 30?
- how often can the spikes seen? It there some pattern in their occurrence? Are they happening regularly? If they are, when is the time between them?
- are some mods/addons needed, or can they be seen with vanilla install as well?
- is some particular mission needed, or can they be seen in a simple mission with no AI at all?
- are spikes specific to some world? Can they be seen on Chernarus only, or on Takistan as well? What about Utes or Desert?
- during the spikes, what files are accessed?

Updated by hofi02 over 2 years ago

i changed signature checking to v1 and the heavy I/O spikes are gone on 32 players current

edit: they came back at around 40 players

Updated by hofi02 over 2 years ago

i know what it is causing for me, i moved the server files to C:\server\arma2 and removed any spaces in the path, that fixed it for me

Can anyone confirm this?

50 players and no I/O spikes currently

Updated by Dwarden over 2 years ago

can't confirm, at least 3 hosters with >55 player servers confirmed me this 'fix' don't work ...

Updated by hofi02 over 2 years ago

it came back for me too, yesterday it was running fine without any single I/O spike, strange, maybe it depend on client
It happens sometimes when someone is connecting to the sever

Updated by dslyecxi over 2 years ago

- last entry visible in the monitor is a failed attempt to reach steam game installation on c: drive. This reminds me of an issue we are investigating with a little success so far, where server seems to be accessing paths somehoe provided by clients

I'm watching our server through procmon atm @ 100 players. I do not see signature spikes yet but I do see a lot of createFile attempts on paths that are not valid for our server, such as:

 C:\program files\bohemia interactive\arma2\
 c:\program files (x86)\steam\steamapps\common\arma2 operation arrowhead\
 d:\games\arma2\
 c:\games\oa\
 c:\games\steam\

Our actual install is in D:\Steam\steamapps\common\arma 2 operation arrowhead and is referenced as d:\oa_ref\ via symlink

These createFile attempts result in "PATH NOT FOUND" and happen pretty much non-stop, something like a dozen or more each second at 100 players. It spans all sorts of files - addons as well as default ArmA content.

While I wouldn't draw any conclusions about us not seeing I/O spikes currently, I will say that our new server has ArmA2 on an SSD. If we don't see any spikes this session, it might be worth considering that as a possible mitigation of the issue.

Updated by dslyecxi over 2 years ago

Just ran into this on Celle2 in an ~80-90 player coop.

Image of it - the major large spikes on the right half, with corresponding low network activity, are the OA ones.
https://dl.dropbox.com/u/263501/dh/929pm_heavy_io_spikes_in_celle2_coop.png

Netlog, server rpt, and process monitor log:
https://dl.dropbox.com/u/263501/dh/iospikes_around_910pm_and_earlier_923cut.7z

Updated by dslyecxi over 2 years ago

I connected to our server during the above-mentioned mission just now and observed the following.

When I connected, a major IO spike occurred. Shortly after I connected, another spike occurred - corresponding to another player joining. I then disconnected and a MASSIVE IO spike happened at the precise time. Upon reconnecting there was only a small spike. See the below image for more.

Also, as I was typing this, we had two major spikes - one was from accessing characters_e.pbo, the other from caa1_p_buildings.pbo. When I disconnected earlier and saw that spike happen, IIRC it was wheeled*.pbo that showed up.

https://dl.dropbox.com/u/263501/dh/io_spike_cause.png

I will save the new version of the procmon file and link it later on, containing these more recent incidents.

Updated by Suma over 2 years ago

Netlog, server rpt, and process monitor log:

There are absolute time stamps in the process monitor log and in the rpt, relative time stamps in the netlog, but there are no time stamps in the Process Explorer graph. Would it be possible for you to check what were the times when the I/O spikes were happening, so that I can match it with the logs?

Updated by Suma over 2 years ago

When I connected, a major IO spike occurred
I then disconnected and a MASSIVE IO spike happened at the precise time.

That is very interesting, indeed. Do such spikes happen only with verifySignatures used, or also with verifySignatures = 0 ?

Updated by Suma over 2 years ago

I'm watching our server through procmon atm @ 100 players. I do not see signature spikes yet but I do see a lot of createFile attempts on paths that are not valid for our server, such as:

These createFile attempts result in "PATH NOT FOUND" and happen pretty much non-stop, something like a dozen or more each second at 100 players. It spans all sorts of files - addons as well as default ArmA content.

This is an issue we are aware of (I have created a ticket for it now I #38093). It may or may not be related, we were unable to find what is causing it so far - and I would be definitely interested to learn more about it. If you have anything to add, please do it there.

Updated by Suma over 2 years ago

  • Status changed from Assigned to Resolved

It seems currently the two issues have the same cause and fixing #38093 should hopefully fix this as well. However, as I cannot see how this could be causing any I/O spikes on a player disconnect, a careful testing once server 95218 or later is available is needed.

Updated by kju over 2 years ago

  • Target version changed from 1.61 BETA to Upcoming version OA

Updated by dslyecxi over 2 years ago

There are absolute time stamps in the process monitor log and in the rpt, relative time stamps in the netlog, but there are no time stamps in the Process Explorer graph. Would it be possible for you to check what were the times when the I/O spikes were happening, so that I can match it with the logs?

Here is the procmon log including the annotated spikes:
https://dl.dropbox.com/u/263501/dh/120721_1002pm_cutoff.7z
includes: https://dl.dropbox.com/u/263501/dh/io_spike_cause.png

The timestamps from the server relating to me connecting/disconnecting/etc are:


21:50:02 Dslyecxi uses modified data file
21:50:02 Player Dslyecxi connecting.
21:50:06 Player Dslyecxi connected (id=1459974).
21:50:23 Smokedawg uses modified data file
21:50:23 Player Smokedawg connecting.
21:50:25 Player Smokedawg connected (id=3911494).
21:51:27 Player Dslyecxi disconnected.
21:51:49 Player Rouza disconnected.
21:52:20 Dslyecxi uses modified data file
21:52:20 Player Dslyecxi connecting.
21:52:23 Player Dslyecxi connected (id=1459974).
21:55:47 Player Shredder disconnected.

Updated by Suma over 2 years ago

As a temporary workaround until next server release, the server admin could place the game files in a location where it is unlikely the client could have them located, i.e. to avoid locations like Program Files or Steam\steamapps, and to place them someplace like K:\BlahBlah\Arma (not only symlink them - it is important the server is unable to access them in locations where clients could search for them).

Updated by kju over 2 years ago

  • Target version changed from Upcoming version OA to 1.62 BETA

Updated by kju over 2 years ago

  • Target version changed from 1.62 BETA to 1.62.95248

Updated by dslyecxi over 2 years ago

Looks like this can be confirmed as fixed. Didn't see any unusual IO or procmon in our session yesterday with signatures on.

Updated by Suma over 2 years ago

  • Status changed from Resolved to Closed
  • Assignee deleted (Suma)

Also available in: Atom PDF