← Notebook
June 9, 2026 5 min read

How Random Forests Can Save Our Forests

Firefighters battling a forest fire in Greece, August 2023

Forest fire in Greece, August 2023. Source: The Guardian

In August 2023, one forest fire in the north-east region of Greece scorched almost two hundred thousand acres of land — making it the biggest forest blaze in the European Union since 2000. The Guardian reported this fire destroyed vast areas of forests and homes in the outlying areas of Alexandroupolis, the biggest city in the Evros region of Greece. A research paper published in Nature Communications suggests the Mediterranean region is the most vulnerable to forest fires in all of Europe; however another article published in the European Geosciences Union suggests this problem has been encroaching on Central Europe in recent years. Among all the countries outside the Mediterranean region, Serbia has been most affected.

Forest fires are only one of the dangers to our forests. Deforestation, landslides, and forest pests are some of the other threats the lungs of the natural world are facing. Predicting the occurrence of these problems with high accuracy is the first and most significant step in fighting them. Traditionally, scientists have used statistical models like logistic regression to study these problems, but the efficiency of such models is highly susceptible to large errors when there are uncertainties and outliers in the data. Machine Learning (ML) algorithms have proven to be more immune to such errors. With the recent advancements in ML, scientists are now better equipped to save our forests from their adversaries.

What is a Random Forest?

A group of researchers in Serbia studied the application of an ML algorithm — Random Forests, or RF — to predict forest fires in eastern Serbia. While the name may suggest a forest on a random island, RF is an ML tool that ensembles the predictions of multiple decision trees for more accurate results. It was created by Dr. Leo Breiman, a professor retired from UC Berkeley, and his student Dr. Adele Cutler. A decision tree algorithm recursively partitions data based on feature conditions to create a tree-like structure for making decisions or predictions. In an RF algorithm, multiple datasets are created from the given data. For each new dataset, a subset of available features is chosen randomly to run a decision tree algorithm. Finally, predictions from all decision trees are aggregated.

Diagram showing how a Random Forest algorithm works — multiple decision trees feed into a majority voting / averaging step to produce a final result
How a Random Forest algorithm works: multiple decision trees vote together to produce a final prediction. Source: anasbrital98.github.io

Predicting Forest Fires

The Serbian researchers applied RF to predict forest fire incidents and understand the impact of natural and anthropogenic factors on them. They studied fire locations in over 750,000 acres of land with broad-leaved, conifer, and mixed forests between 2001 and 2018. Pixelating satellite images of the region with each cell representing one square kilometre, they analysed whether a historic fire incident had occurred within each cell. For each cell, they then used RF and logistic regression (LR) to predict whether a future incident would occur.

91.7%
overall prediction accuracy of the Random Forest model, compared to 86.5% for logistic regression

The RF model also identified dry soil conditions as the most significant factor in forest fires, followed by distance to a city. Their studies did show, however, that RF was less efficient than LR in predicting incidents with a lower probability of fire — a caveat worth noting when deploying these models in practice.

Side-by-side fire susceptibility maps comparing Logistic Regression (LR) and Random Forest (RF) outputs for a region of Serbia, coloured from very low (green) to very high (red) risk
Fire susceptibility maps generated by LR (left) and RF (right) for a region in Serbia. RF captures more spatially nuanced high-risk (red) zones in the south. Source: MDPI, 2019

Predicting Landslides

Forest fires are not the only major threat to our forests. Landslides cause significant habitat loss for both flora and fauna living under the forest canopy. According to the Italian Institute for Environmental Protection and Research (ISPRA), Italy suffered from 645 major landslides between 2017 and 2020. In 2015, researchers from ISPRA collaborated with those at Sapienza University of Rome to study the susceptibility of shallow landslides in northeastern Sicily. They applied RF and LR to study the relationships between landslides and factors like land use, lithology (classification of rocks), agricultural terrain, and forest fires.

The RF model found higher susceptibility correlated with steeper slopes, the presence of burned areas, concave plan curvature, pastures and crops, and proximity to streams. Lower susceptibility correlated with forests, shrub vegetation, and flat coastal areas. Moreover, the RF model was more efficient at minimising false positives — areas predicted as unstable but observed as stable — compared to LR.

Studying Deforestation

Although landslides are natural, several human activities like deforestation trigger them. Whether it is to build new settlements or graze the animals we feed on, cutting trees down has been our least eco-friendly outdoor activity since the dawn of civilisation. Deforestation stands out as the foremost anthropogenic catalyst in the degradation of forests, exerting the most pronounced effects on its health. Studying deforestation and its impacts is one of the critical steps to saving our forests.

Researchers from Brazil and the UK conducted a joint study relating metrics of deforestation (DF) and forest fragmentation (FF) to socioeconomic and biogeophysical factors in the Brazilian Atlantic Forests. Forest fragmentation refers to the loss of forest area and the division of the forest into smaller, isolated blocks. They implemented RF and multiple-step LR models across an area of over five hundred municipalities with major industries like cattle rearing and coffee production. The RF model identified minimum distance to road and longitude as the biggest contributors to the growth rate of deforestation, and found a strong positive correlation between forest fragmentation and road density. They concluded that RF is a better tool than LR for analysing the combined impact of topographical and socioeconomic factors on forests.

Tracking Forest Pests

Not every adversary of our forests is gigantic. While forest pests may be minion-sized, their detrimental effect on forest health has concerned organisations worldwide. According to the Food and Agriculture Organization (FAO), forest pests adversely affect the growth of trees and the yield of wood and non-wood products essential to forest biodiversity. An article in Proceedings of the National Academy of Sciences showed that carbon storage lost due to pest invasions is equivalent to the carbon emissions of five million vehicles.

One invasive species of concern is the Citrus Flatid Planthopper (CFPI) in the forests of South Korea. Researchers from several Korean institutes collaborated to study the occurrence patterns of CFPI using RF. Studying 105 sites each for presence and absence of CFPI, they identified distance to road as the most important predictive variable, and concluded that CFPI occurrence was most probable in urban and agricultural areas at low altitudes. Their results demonstrated how human activities close to a forest can make it more prone to invasive species.

Looking Forward

As one more tree succumbs to an axe and we lose another acre of forest to destruction, our job of saving our forests becomes more challenging. We must take advantage of the advanced tools scientists and researchers have created. Random Forest is one such tool — capable of effectively recognising intricate, interactive, and non-linear connections between responses and predictors across wildly different problem domains. The future of our forests relies on whether we restrict the applications of machine learning tools like Random Forest to making our lives more convenient, or utilise them to save the natural world.

References
  1. Agence France-Presse. Greece wildfires declared largest ever recorded in EU. The Guardian, 29 August 2023.
  2. IBM. What is Random Forest? ibm.com/topics/random-forest, retrieved December 18, 2023.
  3. D. Lee et al. Proceedings of the National Academy of Sciences of the USA, 116(35), August 12, 2019.
  4. Researchers from ISPRA & Sapienza University of Rome. Susceptibility assessment in Giampilieri (NE Sicily, Italy). Geomorphology, 249, 2015.
  5. Forest fire susceptibility using Random Forest Model. MDPI, 12 July 2019.
  6. Forest Fragmentation and Deforestation study, Brazilian Atlantic Forest. Environmental and Ecological Statistics, 22 November 2017.
  7. Non-stationary climate–fire models. Nature Communications, 9, 2018.
  8. Forest-induced landslides in Italy. Earth System Science Data, 15(7), 2023.
  9. Random Forest Method study. Multidisciplinary Digital Publishing Institute (MDPI), December 22, 2020.