Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for Rare BufferOverflow that could cause rcon to become unresponsive core/rcon.js #291

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

Matttor
Copy link

@Matttor Matttor commented Apr 19, 2023

Bug fix to decoder for rare overflow caused by packet sequence illustrated below,
(reads from random point in the buff to get size of packet that can cause non returning loop)
looking for a packet of L10 and then looking for something on the end that matches 0100000 returns false when TCP has cut packet as shown; the next data will be "fragments". This is compounded because Squad on rare occasions sends 0100000 to denote chat msg breaks and it does not follow spec.
image
Rework of decoder to better suite OWI Squads rcon flavour. (oversize packets, 0x01, NO 0001000, instead 0100000)
(Note all decode checking I have written is overkill but I have developed it regardless, all bad packets came from the bug above TCP does not break like that at the application layer).
See my simpleRcon examples for underlying logic.
I have 'Wrapped' my rcon client methods in SquadJS-like methods (looking to confirm I've not missed anything)
.execute(){ .once() is achieved by using the packet id as the event id, a new counter this.msgId starts at 20 and is reset at 80, tuned for timeout errors on current map call timeout goes to aprox 10 seconds.
This is my first draft, well tested in sandbox, not yet executed with SquadJS.

Refactor
Bug fix to decoder for rare overflow caused
Bug fix decoder, rework
@Matttor Matttor changed the title Fix for Rare BufferOverflow that could cause rcon to become unresponsive condition core/rcon.js Fix for Rare BufferOverflow that could cause rcon to become unresponsive core/rcon.js Apr 19, 2023
@Matttor
Copy link
Author

Matttor commented Apr 19, 2023

This also takes care of the password logging issue by writing to .netconnection directly,
no Socket issues, we use node:net
Improved error logging, you should get less general complaints about strange error/no error logging.
incorrect password will result in connection attempt loop, .net socket is closed by the server without a reject packet... suggest adding timeout to flag => check password.

@Matttor
Copy link
Author

Matttor commented Apr 19, 2023

fixes issues 262, 260, 200

@Matttor
Copy link
Author

Matttor commented Apr 19, 2023

I have sandboxed this against a live server, over 1 million requests and responses with 0% failure.
While legacy version in the sandbox failed within the first 100,000 as the server filled and chat became busier.

fix to string/body typo
Clean up of listeners, now cleans up all listeners,
auth once only, 
error on connect while connected,
minor method rework
removed unintended addition to constructor
Squad ver 4.3 was outputting over size > 4096, still assessing 4.4, using 8192 for now
fixed mistake
@Matttor
Copy link
Author

Matttor commented Apr 22, 2023

this.autoReconnectDelay = options.autoReconnectDelay || 5000; seems incorrect? - followed the core/rcon.js file, however it seems to be passed,
autoReconnectInterval: this.options.rconAutoReconnectInterval from squad-server/index.js

@sergelouie6
Copy link
Contributor

We've been using this for days now and no issues encountered yet.

We'll be monitoring it and see if we can find anything.

Server chat call sequence simplified, removed eventlistener, now call processChatPacket() directly on decoded chat packet, this allowed for further cleanup of connect()
Chat event listeners no longer cleared on rcon restart
this.removeAllListeners() was cleaning up all event listeners related to chat etc,
only need a broad fail proof way to clean up "auth" listeners.
.once is the best way I can see to create a trigger that resolves connect() promise.
for now I have added this.events, and moved "auth" event to this.events
this is to allow use of .removeAllListeners() to clean up connection auth listener that are hard to ID, while still resolving the promise and not bloating the code.
After testing I feel this is a better solution/extension to my last commit
"auth" is determined above net level "close", "data" etc but as a sudo grouping it makes enough sense here, new netConnection() when connect() is called cleans up,
#cleanUp is removed as we can now just handle everything in onClose
@milutinke
Copy link
Contributor

Wanted to ask if this is working properly and is it stable, since there has been quite a long time now for it to be tested?
I'd like to use this in sqcp

@sergelouie6
Copy link
Contributor

Wanted to ask if this is working properly and is it stable, since there has been quite a long time now for it to be tested?
I'd like to use this in sqcp

Hi, we've been using this for weeks now and have already tested numerous versions. However, I'm not sure yet if this is the latest version that we've been using.

@Matttor
Copy link
Author

Matttor commented May 26, 2023

I hope to push the latest version within a few days, it has had the reject/resolves cleaned up and some other minor tweaks, however for most people outwardly it will work just the same.

@milutinke
Copy link
Contributor

Awesome, sounds good to me.
Thanks for the fast response.

werewolfboy13 and others added 2 commits May 26, 2023 14:19
This version has has had promises cleaned up,  and some methods moved into connect()
@rwsender
Copy link

Is this going to be merged? Looking to potentially use SquadJS but would like to see these issues closed out first for stability. Thanks!

@milutinke
Copy link
Contributor

Is this going to be merged? Looking to potentially use SquadJS but would like to see these issues closed out first for stability. Thanks!

I've used Squad JS well over a year when I had my server, it rarely crashed.
You can use pm2 to auto restart it when it crashes.

@sergelouie6
Copy link
Contributor

Is this going to be merged? Looking to potentially use SquadJS but would like to see these issues closed out first for stability. Thanks!

Why not use SquadJS and use this file instead?

I've been using this for weeks now and no issues so far.

When we were on FTP before, rcon silently dies almost 10+ times a day and I had to restart every hour.

@sergelouie6
Copy link
Contributor

Is this going to be merged? Looking to potentially use SquadJS but would like to see these issues closed out first for stability. Thanks!

I've used Squad JS well over a year when I had my server, it rarely crashed. You can use pm2 to auto restart it when it crashes.

The main issue is not crashing, but the rcon silently dies without crashing.

@Matttor
Copy link
Author

Matttor commented Jun 22, 2023

Is this going to be merged? Looking to potentially use SquadJS but would like to see these issues closed out first for stability. Thanks!

If you are able to just swap the files, do that for now. Its a low effort fix :)
In terms of merging I think the authors/maintainers of SquadJS have packages that rely on older versions of Node and are looking at/testing before anything could happen re merge.
I was unaware of this when I wrote this rcon file and it should be used with Node 18+, that said everyone so far who is using it has had success.
It has a lot of good uptime collectively.

@Matttor
Copy link
Author

Matttor commented Jun 22, 2023

Can confirm it runs fine with Node 16.9.0
However I never checked or wrote it using that documentation from that version

@werewolfboy13
Copy link
Collaborator

@fantinodavide Are these still valid/good PRs?

@werewolfboy13
Copy link
Collaborator

Conflicts will need resolving now.

@Matttor
Copy link
Author

Matttor commented Nov 1, 2023

Not sure what happened there,
delete 99 to 169,
also I have some minor fixes:
change the oversize buffer value to suit squad rcon (4096 is too small)
if (length > 4154) Logger.verbose("RCON", 1, Error occurred. Oversize, "${length}" > 4154)

There is one minor bug fix to handle possible event listener leak, I can do another pr or you may just do the following;
add
this.removeAllListeners("server");
this.removeAllListeners("auth");

at
async connect() {
return new Promise((resolve, reject) => {
if (this.client && this.connected && !this.client.destroyed) return reject(new Error("Rcon.connect() Rcon already connected."));
this.removeAllListeners("server");
this.removeAllListeners("auth");
this.on("server", (pkt) => this.processChatPacket(pkt));

Or I can do another pr

@Matttor
Copy link
Author

Matttor commented Nov 2, 2023

I understand now, to resolve this conflict by re-adding lvl 4 pwd vBlog;

#sendAuth() {
Logger.verbose("RCON", 1, Sending Token to: ${this.host}:${this.port});
Logger.verbose('RCON', 4, Writing packet with type "${this.type.auth}" and body "${this.password}".);
this.client.write(this.#encode(this.type.auth, 2147483647, this.password).toString("binary"), "binary");
}

Brought upto date with pw std, listener and packet size tweaks
err -> error
@Matttor
Copy link
Author

Matttor commented Nov 2, 2023

Have made the changes, this is now inline with the version people have been using, + the lvl 4 pw logging etc

@werewolfboy13
Copy link
Collaborator

@Matttor still getting conflict warnings, you may need to resolve them through the PR/CLI process first. It is still preventing me to do anything currently. If you can't fix I'll take a look later but please don't make another PR. Want to keep everything tidy.

@fantinodavide
Copy link
Contributor

@Matttor just had an issue, the badPacket() method got called after updating the player list, probably too frequently, but setting the condition as follows fixed my issue:
if (bufSize > 6144 || bufSize < 10) return this.badPacket();
6144 is just a random number, we should work on finding the appropriate one, but gets the job done.

@fantinodavide fantinodavide mentioned this pull request Dec 13, 2023
@fantinodavide
Copy link
Contributor

@Matttor some important notes here #318 (comment)

@Ulibos
Copy link

Ulibos commented Dec 14, 2023

Sorry, I gave you a bit misleading info yesterday. 11kb package I was receiving was a TCP payload size, not an rcon's packet_size. Biggest packet_size I've seen so far was ~4200b. If you prefer to keep some sort of a safeguard then I'd go with 4096*2 as it would still save you from 99% of the junk yet let obscenely fat legit packages go through. That is strictly optional suggestion though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants