They perform well with mixed numerical and categorical data, don’t require much tuning to get a reasonable first version of a predictive model, are fast to train, are intuitive to understand, provide feature importance as a feature of the model, are inherently able to handle missing data, and have been implemented in every language. Random forests don’t make any strong assumptions about the scale and normality of incoming data.
Due to their simple nature, lack of assumptions, and general high performance they’ve been used in probably every domain where machine learning has been applied. The “forest” in this approach is a series of decision trees that act as “weak” classifiers that as individuals are poor predictors but in aggregate form a robust prediction. The random forest, first described by Breimen et al (2001), is an ensemble approach for building predictive models.