• Please make sure you are familiar with the forum rules. You can find them here: https://forums.tripwireinteractive.com/index.php?threads/forum-rules.2334636/

Killing Floor 2 - Steam/EGS PC Only Hotfix Changelog

Bug fixes included
  • Addressed connection issues on EGS that were preventing valid connections to available servers that impeded reliability of Find a Match, Create a Match, and Server Browser join attempts.

Notes:

We apologize for the inconvenience this issue has caused our EGS player base and for the length of time it took for us to find a solution. We appreciate immensely your patience and consideration with us as we prepared this patch. Given the nature and severity of this issue, we thought it would be fitting to provide a bit more visibility about the issue, how it was resolved, and a bit more insight into the process. While this patch addresses what we believe was the core issue around EGS multiplayer connections for the majority of users, as this fix is deployed and observed at full scale we’ll continue to monitor behavior and see if any additional work is required.

What was the issue?​


TL: DR: Servers could end up sending bad information to EGS clients about their status when certain incomplete connections happened. This issue became more prevalent as more players connected to a server.

More Technical:


At its core, the issue is rooted in the player lists of active connections within a server and how that is reported to the game client. When a connection was failing to a server it was because on EGS the services we utilize were reporting it as full incorrectly to the true state of the active lobby. As the Steam client utilizes a different service for this lobby list it corresponded with the true state of the server and was unaffected by this issue. This is why individual servers would fail whenever they got into this incorrect state and at a macro level when enough servers presented this problem it would then influence matchmaking behavior and reliability. As the issue only occurs situationally with enough player interaction which spiked with the latest Free Week on EGS, this is why we implemented forced restarts on the official server fleet every 12 hours in order to clear the log jam of connections to revert affected servers back into a valid available state.


How did you fix this issue?​


TL: DR: We added additional validation to the server when this occurred that clears phantom players from the server list to accept new connections.


More Technical:

As we detected that issue of the player’s being still listed within the server lobby was tied to scenarios in which the player disconnected before all the necessary components could be loaded that is needed to make a necessary deallocation call to the backend service managing the lobby player lists, we needed to add some validation checks that identified when this occurs and make sure the relevant deallocation calls were made within at least a 2-minute threshold in which the server state is regularly refreshed. Once previously connected players are properly cleared from the list, it should then be valid again for new connections within any available slots.

Why did it take so long to fix?​


TL: DR: There are baked-in processes and timelines that need to be followed for every patch, reproduction in a controlled environment was difficult to reliably generate and discover, systems involved that needed to be checked were numerous and delicate, and careful testing and validation of all necessary dependent systems require more accuracy than speed.

More Technical:


For this, we need to get into a bit of inside baseball as it pertains to fixing bugs within game development. While implementing the fix is one part of the equation, the vast majority of time spent addressing bugs encompasses:
  1. Detecting the issue
  2. Escalating it to our internal testers
  3. Replicating the issue in live and collecting information
  4. Forwarding that information to the development team or other partners managing external services we rely on.
  5. Developing a controlled reproduction path within a development environment
  6. Ruling out possibilities to narrow down root causes
  7. Identifying who all in the team is the best fit to solve the problem once the root cause is identified
  8. Securing that team’s time to pivot their focus toward the issue
  9. That team developing plans for solving the problem
  10. The team implementing the fix within the development environment
  11. Making a build with the fix to pass to the QA team
  12. QA validating the fix to be stable and then testing around the affected areas to identify any knock-on issues from the fix
  13. Addressing any knock-ons as a separate issue and iterating through builds until a releasable candidate fix is confirmed.
  14. Completing a full testing pass of all other content to ensure the build has all other content in an expected and stable state.
  15. Once builds are validated, uploaded, and certified, coordinate within our operations team to schedule a live deployment along with resources to manage the release to the public to make sure the update goes out smoothly with server admin partners notified.

These are the required steps for every patch we manage no matter how small the change or fix might be. The more complex and difficult the issue is to solve the longer each of these steps may take. It suffices to say that this issue was particularly difficult to suss out the root cause due to the numerous dependencies we have within our backend systems that all need to work in perfect concert to function as properly. It is why we needed to take a very careful “measure twice, cut once” approach to ensure any fix actually solved the root issue and did not lead to anything else getting worse.

We understand that this process can feel opaque to you the player on the other side but want to assure you the time was necessary and required the coordinated effort of the team during this whole process to get to the solution we have today.

We hope you appreciate the window into our process and as always, thank you for your continued support!
 
I work for Tripwire and run this stuff.
View all 255 featured items
@o2xVc3UuXp0NyBihrUnu , it wasn't introduced in the summer update. It was present since the inception of Killing Floor 2 on EGS, I believe, as I have faced it a lot of times ever since I started playing the game.
@Yoshiro , thanks a lot for finally patching it up without providing a workaround :D (restarting servers & services was always one that helped me going earlier)
 
Last edited:
Upvote 0