Lisa S. answered 05/08/25
Bachelors in Computer Science with 5+ Years of Teaching Experience
In general, after you have tried fitting with default values and early stopping (for XGBoost), you would do a grid search or random search over a chosen range (wider initially) for around 2-4 hyperparameters at a time. Then, once you narrow down the range, you run a finer search, rinse and repeat until the gains plateau or you hit your goal.
More specifically for Random Forest:
The main hyperparameters you'd want to tune are n_estimators (number of trees) and max_depth. If necessary, max_features is also good to tune.
- Use RandomizedSearchCV to sample 20-50 random combinations for n_estimators and max_depth (and max_features, optionally.) Fit it with your cross-validation to pick out the best settings.
- You could fine-tune it further by doing a GridSearchCV around those best settings you picked out.
For XGBoost:
The main hyperparameters you'd want to tune are learning_rate, max_depth, and n_estimators. Let early stopping pick the n_estimators (for example, stop if there is no improvement after 20 rounds).
- Choose and try out a few learning rates, and pick the best one.
- Set your learning rate, then choose your early stopping rounds and fit it to find your n_estimators.
- Use RandomizedSearchCV on max_depth with one or two regularizers like subsample and gamma.
- Fine-tune it further with GridSearchCV if necessary.
Remember to watch out for overfitting and validate your model on a separate test set. Good luck!