Machine learning in biohydrogen production: a review

Contents Biohydrogen is emerging as a promising carbon-neutral and sustainable energy carrier with high energy yield to replace conventional fossil fuels. However, biohydrogen commercial uptake is mainly hindered by the supply side. As a result, various operating parameters must be optimized to realize biohydrogen commercial uptake on a large-scale. Recently, machine learning algorithms have demonstrated the ability to handle large amounts of data while requiring less in-depth knowledge of the system and being capable of adapting to evolving circumstances. This review critically reviews the role of machine learning in categorizing and predicting data related to biohydrogen production. The accuracy and potential of different machine learning algorithms are reported. Also, the practical implications of machine learning models to realize biohydrogen uptake by the transportation sector are discussed. The review indicates that machine learning algorithms can successfully model non-linear and complex interactions between operational and performance parameters in biohydrogen production. Additionally, machine learning algorithms can help researchers identify the most efficient methods for producing biohydrogen, leading to a more sustainable and cost-effective energy source.


Introduction
Researchers are working on renewable resources to produce clean energy substitutes for fossil-based fuels.This is driven by rising concerns about climate change, increasing oil prices, and health issues caused by airborne pollutants (Vassilev and Vassileva, 2016).Only 10% of the world's energy demand is met by modern biomass conversion, while the remaining 90% comes from fossil fuels such as coal, natural gas, and oil (Shuttleworth et al., 2014).Since energy is crucial for any nation's economy, many countries seek reliable ways to make alternative fuels (Kaloudas et al., 2021).Some alternative fuels have an energy density close to that of fossil fuels and thus may replace them to solve concerns about carbon footprint (Roy et al., 2015;Shanmugam et al., 2020).Biomassderived biofuels are also promising for addressing potential energy shortages in the future (Srivastava et al., 2020).
Biofuels derived from modern biomass are carbon-neutral and renewable.Unlike fossil fuels, which release carbon dioxide (CO2) sequestered for millions of years, biomass-derived biofuels are made from recently grown plants and thus do not contribute to increased atmospheric CO2 levels (Saravanan et al., 2022).As a result, biofuels are a promising alternative to fossil fuels to lower greenhouse gas emissions and mitigate climate change.Another advantage of biomass-derived biofuels is that they can be made from biomass sources, including agricultural residues, forestry waste, and energy crops (Demirbas, 2009).This means they can be manufactured locally, reducing reliance on imported fossil fuels and improving energy security.However, one of the main challenges in producing biomass-derived biofuel is breaking down the substrate into relatively simple moieties.
Recently, hydrogen has shown the potential to be an excellent surrogate fuel because hydrogen has a higher energy density than other biofuels, at approximately 140 MJ/kg; it is quickly produced and transported and can be used directly in cells to produce energy (LewisOscar et al., 2015; Nagarajan et al., 2017;Show et al., 2018;Kumar et al., 2019).Nonetheless, these benefits are hindered by process limitations at large-scale (Sivagurunathan et al., 2016).Therefore, different ways to upscale the process are currently being investigated.As organic substrates are readily available, process parameters can be optimized to boost hydrogen production rates, and biological methods and genetic modification can enhance hydrogen yields (Nath and Das, 2011;Sivagurunathan et al., 2016;Zhao et al., 2017).Compared to other biofuels, biohydrogen stands out for being both carbon-free and highenergy-dense.However, improving biohydrogen production is challenging due to the complexity of biohydrogen generation systems.
Nanotechnologies have been recently developed for farming, food, pharmaceutical, and energy industries (LewisOscar et al., 2016;Chari et al., 2017;Vasantharaj et al., 2019).For instance, nanomaterials improve many biological processes because of their effect on microbial growth, intracellular electron transfer, and the interaction of metalloenzymes responsible for hydrogen yield (Liu and Tang, 2017).Therefore, nanomaterials can boost biohydrogen production (Yang and Wang, 2018).Nanoparticles facilitate electron transfer between microorganisms and electrodes or other electron acceptors, improving the efficiency of biohydrogen production (Cheng et al., 2020).There has been a lot of interest in using nanoparticles as ingredients to boost biohydrogen production, and a few studies have shown that it can be an effective strategy (Kumar et al., 2019).
The recent progress in machine learning can potentially create new opportunities in large-scale biohydrogen production.Indeed, machine learning can be applied to analyze large datasets and identify patterns that could be valuable for designing efficient industrial processes (Bertolini et al., 2021).Large datasets from bioreactors can be analyzed using machine learning algorithms to identify patterns and correlations between different process variables.This data can be used to improve the yield and efficiency of biohydrogen production by optimizing process parameters such as temperature, pH, and nutrient concentrations.This could enable the development of tailored solutions for biohydrogen production and optimize the efficiency of the production processes.
For example, machine learning could be used to identify the best combination of factors for increasing the yield of biohydrogen production from a specific strain or environment.This can be accomplished by training machine learning models on the large microbial genome and metabolic data datasets and then employing these models to predict the metabolic pathways and energy production potential of various microbial strains.This data can be used to identify the most promising microbial strains for biohydrogen production and to develop targeted genetic engineering strategies to improve their performance even further.Additionally, machine learning algorithms are used to develop predictive models for biohydrogen production, identify the optimal conditions for biohydrogen production, and optimize the process accordingly (Kumar Sharma et al., 2022;Pandey et al., 2023).Finally, machine learning could be applied to develop new real-time tools for controlling and managing the production process.This review, therefore, provides a comprehensive knowledge of machine learning's role in biohydrogen production.Recent advances in machine learning-assisted biohydrogen techniques are discussed.The anticipated production of biohydrogen from waste is also covered.The scientific and technological roadblocks and pathways to machine learning use in biohydrogen production are outlined.Besides, the patent landscape analysis of machine learning-enabled biohydrogen production is presented.The comparison of the present review with previously published reviews in this domain is provided in Table 1.

Importance of biohydrogen
"Biohydrogen" refers to the dihydrogen gas (H2) produced by microorganisms like bacteria, archaea, and algae.Biohydrogen can be manufactured biologically in several ways, such as through hydrogen-producing bacteria, i.e., fermentation (Wang et al., 2012), microbial electrolysis cells (Cardeña et al., 2019), and biophotolysis (Ghirardi et al., 2014).These methodologies, which utilize biowastes, are not only cheaper than other energy-producing methods, but they also produce no pollution.Biohydrogen offers a suitable replacement for carbon-based fuels and stands out as a potential clean energy carrier (Sung et al., 2003;Carere et al., 2008).
Since it is feasible to produce biohydrogen from non-depletable resources, especially wastes (Dutta et al., 2022), waste processing problems and land pollution from landfilling can also be fixed simultaneously (Saratale et al., 2008;Panagiotopoulos et al., 2009;Kiran et al., 2014).As a result, making biohydrogen from waste is attracting a lot of interest.Consequently, biohydrogen is considered the future's primary energy carrier (Kapdan and Kargi, 2006).Manufacturing models proposed for hydrogen production from biological biomasses can generate energy with limited density and elevated heating value (Rachman et al., 1997;Yokoi et al., 1998).
In contrast to the extreme pressures and temperatures required for hydrogen generation from natural gas, biohydrogen production methods can be engineered to be carbon neutral.Hydrogen is produced in large quantities annually for use in manufacturing processes, but its use as an energy source to substitute fossil fuels is minimal.Ammonia generation for fertilizer, petroleum cracking, and methanol manufacturing are the primary industrial uses for hydrogen.Biomethane gas is crucial bioenergy, but costeffective biohydrogen generation may eventually replace it (Powell et al., 2012).Compared to conventional production techniques like the steam revamping of petroleum products and water electrolysis, biological hydrogen manufacturing processes are safer for the environment and require minimal energy (Kapdan and Kargi, 2006).As a result, biohydrogen draws attention worldwide due to its potential to serve as a limitless and inexpensive renewable energy provider.Although global attention has grown, this field of research remains in its infancy (Show et al., 2012).The most frequently used keywords in "biohydrogen" research are depicted in Figure 1, showing the preponderance of "biohydrogen", "fermentation", and "dark fermentation" keywords.Figure 1 is based on co-occurrence analysis with "all keywords" as a unit of analysis using the VOSviewer tool.

Overview of biohydrogen production
The most common method for biohydrogen production is microbial electrolysis (Varanasi et al., 2019).In this process, microbial fuel cells convert organic matter into electricity, which is then used to produce hydrogen gas from water (Varanasi et al., 2019).Other methods for biohydrogen production include photobiological hydrogen production (Touloupakis and Torzillo, 2019), dark fermentation (Kumar et al., 2015), and photo fermentative hydrogen production (Hallenbeck, 2013).These methods all involve using biological organisms to produce hydrogen, and they are becoming increasingly popular due to their sustainability and low cost.
The most important step in the cost-effective production of biohydrogen is to turn sophisticated organic feedstock into simple glucose that can be fermented (Zhang et al., 2015).Furthermore, the type of biomass used for biohydrogen production dictates the pretreatment method required.For example, lignocellulosic materials require comprehensive pretreatment due to their complex structure, consisting of cellulose, hemicellulose, and lignin (Srivastava et al., 2016;Shanmugam et al., 2019).Since lignocellulosic materials contain high concentrations of hemicellulose and cellulosederived polymers, they are considered good sources for biohydrogen production (Srivastava et al., 2016;Shanmugam et al., 2019).Hydrolytic enzymes are necessary to break down carbohydrate polymers into simple sugars, but recalcitrant lignin constituents limit their activity, thus limiting the material's usefulness.To this end, enzymes that break down lignin and ferment hemicellulose and cellulose molecules into simple sugars have been frequently investigated during the pretreatment phase (Sherpa et al., 2018).Free enzymes in pretreatment have many benefits, including being more efficient with the energy used, catalyzing specific degradations or conversions, and not leading to generating any toxic compounds causing fermentation inhibition (Sherpa et al., 2018).However, free enzymes have a short shelf life, are expensive, and cannot be recycled (Macrelli et al., 2014).Therefore, there has also been a lot of focus on immobilizing enzymes on nano supports so they can be used in biomass pretreatment to produce biohydrogen (Zhang and Shen, 2007).Nanomaterials seem well-suited for the bond formation of hydrolytic enzymes that greatly enhance the effectiveness of the pretreatment method due to the relatively large surface area and variety of chemical and physical characteristics they possess.This approach is also economical because the nano-immobilized bioactive molecules could be retrieved and used again (Rai et al., 2019).Recently, researchers found that biohydrogen production could be enhanced when lignocellulose-degrading enzymes like hemicellulase, laccase, and cellulase were immobilized on nano supports (Chang et

Dark fermentation
Dark fermentation is a process by which organic matter is broken down in an anaerobic environment to produce biohydrogen.The process involves using microorganisms, such as bacteria, to break down the organic matter and produce hydrogen as a by-product (Kim et al., 2013).Dark fermentation is an efficient way to produce hydrogen and is famous for rapidly producing energy from various renewable substrates (Kumar et al., 2015).Because it does not use

Photofermentation
In the absence of oxygen, purple non-sulfur photosynthetic bacteria undergo a fermentative transformation of organic matter into hydrogen and CO2, known as photofermentation.Light is harnessed for its energy ).These purple non-sulfur photosynthetic bacteria utilize an inverted electron transport method to reduce ferredoxin, capturing light energy to produce adenosine triphosphate (ATP) and highenergy electrons.Subsequently, nitrogenase converts hydrogen protons into hydrogen using ATP and reduced ferredoxin.Nitrogenase activity can produce hydrogen even in the apparent lack of nitrogen.The route for producing biohydrogen from biomass in a single phase using photofermentation is depicted in Figure 3.The excess organic acids in the hydrogenic treated wastewater of dark fermentation can be converted to biohydrogen by purple non -sulfur photosynthetic bacteria using photofermentation.One of the main benefits of photofermentation is that it can theoretically produce more hydrogen than dark fermentation.Dark fermentation and photofermentation were demonstrated to boost hydrogen yield from hexose and pentose fermentation, respectively, from 4.2 to 12.1 mol-H2 per mol-hexose and from 2.1 to 10.2 mol-H2 per mol-pentose (Jacob et al., 2015), which is necessary for a method to be efficient and financially viable.To produce hydrogen, photofermentation uses the hydrogenic effluent from the dark fermentation stage as a substrate for the purple non-sulfur photosynthetic bacteria.Thus, the process improves energy recovery from the substrate and addresses the issue of low substrate energy conversion in dark fermentation.Figure 4 depicts combined dark fermentation and photofermentation to produce biohydrogen from biomass.

Other methods
Hydrogen can also be produced from hemicellulose obtained from the hydrolysis of lignocellulosic materials (Akubude et al., 2021).Hemicellulose can be extracted from various plant materials, including agricultural residues, and converted into hydrogen via dark fermentation (Akubude et al., 2021).During dark fermentation, hemicellulose is broken down by microorganisms in the absence of light, and the resulting organic acids are then converted into hydrogen gas through a series of biochemical reactions.This process has the potential to be a sustainable and renewable source of hydrogen, as it utilizes agricultural wastes that would otherwise be discarded.Figure 5 shows hydrogen can be obtained through fermentation using anaerobic microorganisms on xylose.Other processes, such as electrolysis (Cardeña et al., 2019) and gasification (Cao et al., 2020), can also be used to produce biohydrogen.Electrolysis is when an electric current is used to split water molecules into hydrogen and oxygen (Cardeña et al., 2019).This can be done using renewable energy sources like wind, solar, or hydropower.The resulting biohydrogen can be considered a sustainable fuel if the electricity used in the process comes from renewable sources.On the other hand, gasification is a process in which organic materials such as biomass, coal, or municipal waste are converted into a gas by heating them in the absence of oxygen (Cao et al., 2020).The resulting gas can be a mixture of carbon monoxide, hydrogen, and other gases, depending on the feedstock and the conditions of the gasification process.The hydrogen can then be separated from the other gases and purified for use as a fuel.

Machine learning models in biohydrogen production
Machine learning models have been used to better understand and optimize the production process of biohydrogen.One approach to using machine learning for biohydrogen production is to use data mining and predictive modeling to identify the factors influencing biohydrogen production.This can be done by analyzing the data generated from various experiments conducted on the production process.From this analysis, predictive models can be developed that can help researchers identify the conditions that lead to higher production yields and the factors that should be adjusted to optimize the process.Another approach to using machine learning for biohydrogen production is reinforcement learning algorithms (Pandey et al., 2023).By learning from trial and error, these algorithms can identify the optimal conditions for biohydrogen production.
Among different machine learning models, artificial neural networks (ANN) are complex machine learning models used to identify patterns in data and make predictions (Rodríguez-Hernández et al., 2021).ANN models were explored to develop models for biohydrogen production processes by identifying the best combinations of parameters for the highest yield ( Figure 6 clearly shows that most ANN models display prediction accuracy values higher than 0.90 and, in some cases, even 0.99, indicating excellent prediction accuracy.In work by Monroy et al. (2018), biohydrogen was simulated to be produced through photofermentation utilizing an immobilized consortium of photo-bacteria, demonstrating the prospects of ANN as a modeling technique.To build the ANN model, a series of controlled, indoor, batch-operated investigational fermentations were conducted at 30 o C with varying light levels, metals such as vanadium, molybdenum, and iron, and initial pH introduced to the medium.The framework was then cross-validated using data from indoor photofermentations.The data-based framework was created by comparing various ANN architectures.The selected architecture showed the potential to display the highest degree of similarity between the ANN model's predictions and the actual biohydrogen production.By comparing the model's predicted kinetics to those obtained from experiments, researchers could see that model could anticipate biohydrogen production.The validity and generalizability of the ANN-based framework were confirmed by testing it on an exterior fermentation in which the light intensity varied throughout the method.
In another work, the authors employed neural network models combined with genetic algorithms to optimize biohydrogen production.To determine the efficacy of the experiments, the data were first analyzed by a neural network.The results showed that a network with a topology of 4-10-1 performed well, with 10 neurons in the hidden layer.The neural network could make an accurate prediction of 99.99%.Training data with known input and output values showed a mean absolute percentage error of 3×10 -10 , mean absolute error of 3.4×10 -8 , and mean square error of 9×10 -8 , indicating precision in the predictions (Prakasham et al., 2011).
In a study, biohydrogen production was studied using a multilayer perceptron artificial neural network (MLPANN) and the microbial kinetic with Levenberg-Marquardt algorithm (MKLMA) derived from microbial growth.The kinetics of significant metabolites were modeled using the MLPANN and the MKLMA during dark fermentation.From the total data, after 24 h of fermentation, the MLPANN with response surface model was used to design the electron-equivalent balance during dark fermentation.Comparisons of model uncertainties were made using expanded experimental data of kinetic data and cumulative data.The authors used MLPANN and MKLMA to investigate the kinetics of the huge metabolites from a small size of investigational data sets.Using the MLPANN and response surface model to statistically analyze the researched process parameters upon such primary metabolites from an electron-equivalent balance perspective, a new, effective method was suggested for demonstrating the complex biohydrogen production during dark fermentation (Wang et al., 2021).
To maximize biohydrogen manufacturing and enhance the efficacy of biogas production, an approach was designed to examine the impact of volatile fatty acids (Mahmoodi-Eshkaftaki et al., 2022).Regression models must be robust for calculating responses in time-dependent methods with limited data.Therefore, a deep neural network on the volatile fatty acids was proposed to estimate biogas production.The deep neural network model's ability to predict the impact of time on biogas implications improved upon that of regression models.Thus, the impact of every volatile fatty acid on biogas substances was determined by employing the time-dependent capabilities provided by the deep neural network model (Mahmoodi-Eshkaftaki et al., 2022).
Recently, several new machine learning methods other than neural networks were evaluated using the mean square error and R 2 to choose the most reliable models for modeling the biohydrogen process.Grid search optimization and the permutation variable importance analysis revealed that the gradient boosting machine, support vector machine, random forest, and AdaBoost were the best models for determining the most critical aspects of the biohydrogen production process.Elevated R 2 values of 0.89, 0.89, 0.9, and 0.89, and low mean square error values of 0.015, 0.015, 0.016, and 0.015 for gradient boosting machine, support vector machine, random forest, and AdaBoost models indicate their effectiveness in predicting hydrogen manufacturing (Hosseinzadeh et al., 2022a).

Applications of machine learning to optimize biohydrogen production
Classical statistical optimization cannot capture the complexity and nonlinearity of the dynamic interaction in the biohydrogen process.However, by integrating data-driven models based on machine learning, researchers can get around the restrictions of traditional methods and model correctly, quickly, and at a low cost (Mohd Asrul et al., 2022).
Using advanced algorithms and predictive models, machine learning can help identify the most efficient ways to produce and store biohydrogen.It can also help identify the optimal conditions for biohydrogen production.Additionally, machine learning can be used to develop strategies for optimizing production processes and controlling the behavior of microorganisms in biohydrogen production systems.Ultimately, this will improve production efficiency, cost savings, and environmental benefits.Machine learning is similarly a powerful tool for monitoring biohydrogen production.It can be used to analyze data from various sources, such as sensors and images, to identify patterns and anomalies and predict future trends.For instance, machine learning algorithms can be used to detect changes in the pH of a biohydrogen production system so that adjustments can be made to optimize the process.Machine learning can also be used to interpret data from sensors that measure temperature, pressure, and other parameters so that operators can make better decisions about the system's operation.
Additionally, machine learning algorithms can be used to interpret images of the biohydrogen production system, such as photos taken with a microscope, to identify cells and microbial activity.This information can then be used to refine the process and optimize biohydrogen production.To forecast the temporal profile of hydrogen production in batch experiments, an ANN framework was established (Nasr et al., 2013).The ANN was designed with a 5-6-4-1 layer backpropagation configuration.Substrate and biomass concentrations, time, temperature, and initial pH were the inputs to the ANN.Researchers used 312 data points culled from 25 different studies to train the model.While training, validating, and testing the model, the investigational and estimated hydrogen generation had a correlation coefficient 0.989.The new data hydrogen production profile was accurately predicted by the ANN, with a coefficient of correlation of 0.98 shown by the findings (Nasr et al., 2013).
In another work, researchers attempted to optimize the primary operating variables for hydrogen generation via photofermentation by creating a new hybrid fuzzy clustering-ranking method coupled to a radial basis function (RBF) neural network.Rhodospirillum rubrum, a light-dependent microorganism, served as the carbon source in the biomass transformation of syngas to hydrogen through the water-gas shift responses.Two exogenous input parameters were used with an RBF neural network to establish a correlation between the exergetic outputs.A combination fuzzy clustering-ranking algorithm was devised and linked with the RBF model to improve both rational and process exergy efficiency while reducing normalized exergy destruction (Aghbashlo et al., 2016).

Biohydrogen production from wastewater
Machine learning algorithms can be used to predict the output of biohydrogen production from wastewater to achieve better yields.Additionally, machine learning can identify wastewater sources that may be more suitable for biohydrogen production and can be used to optimize the anaerobic digestion process.The use of machine learning can reduce the time and resources needed to optimize the biohydrogen production process and provide more efficient and cost-effective solutions for wastewater treatment.Figure 7 shows the interface of biohydrogen production and machine learning models.The agro-industry processing sector routinely produces millions of tonnes of wastewater each year.In keeping with the principles of the circular economy, it may be possible to recover bioenergy resources like biohydrogen from sewage at the same time as it is being treated.Scientists investigated the accuracy of different machine-learning models to recover biohydrogen (Safdar Hossain et al., 2022).A total of eight different datadriven machine learning algorithms were used to create the models, including the cubic support vector machine (CSVM), exponential quadratic Gaussian process regression (EQGPR), fine Gaussian support vector machine (FGSVM), binary neural network (BNN), quadratic gaussian process regression (RQGPR), linear support vector machine (LSVM), rotational quadratic support vector machine (QSVM), and exponential Please cite this article as: Alagumalai A., Devarajan B., Song H., Wongwises S., Ledesma-Amaro R., Mahian O., Sheremet M., Lichtfouse E. Machine learning in biohydrogen production: a review.Biofuel Research Journal 38 (2023) 1844-1858.DOI: 10.18331/BRJ2023.10.2.4 gaussian process regression (EGPR).These models have been trained and evaluated using collected data.The R 2 for predicting hydrogen generated from wastewater using the agro-industrial processes was below 0.69, indicating unimpressive performance by the CSVM, LSVM, and QSVM.Better performance was exhibited by FGSVM, BNN, RQGPR, EQGPR, and EQGPR models, as indicated by the high R 2 > 0.9 (Safdar Hossain et al., 2022).
Recently, a study employed adaptive neuro-fuzzy inference systems (ANFIS) and ANN to forecast the transmembrane pressure as a critical operational parameter in the context of an anaerobic membrane bioreactorsequencing batch reactor during biohydrogen production (Taheri et al., 2021).Testing transmembrane pressure as an output variable, both models used organic loading rates between 0.5 and 8.0 g COD/L/d, effluent pH between 3.59 and 6.87, blended liquor suspended solid between 4.61 and21.52 g/L, and blended liquor volatile floating substances between 3.7 and 15.5 g/L.The ANFIS model prepared for transmembrane pressure forecasting by utilizing hybrid algorithms utilizing a Gauss membership function with 4 participation rates, improved prediction performance.A backpropagation algorithm was used for the feed-forward training of the ANN model.The best architecture was a Levenberg-Marquardt instructional algorithm of 9 neurons in a hidden layer.The R 2 values for predicting transmembrane pressure using ANFIS and ANN concepts were 0.9 and 0.8, correspondingly, while the determined mean square error of transmembrane pressure using the ANFIS model (7.1×10 -3 ) was less than that using the ANN model (8×10 -3 ).The ANFIS model's transmembrane pressure predictive accuracy was superior to that of the ANN model, as evidenced by the ANFIS model's higher R 2 and lower mean square error values.The sensitivity assessment of the ANN model concluded that organic loading rates were the input parameter with the most significant influence on the range of transmembrane pressure variation (Taheri et al., 2021).

Biohydrogen production from fatty acids
Monitoring the process and predicting production, which involves the characterization of the biogas and the production of volatile fatty acids, is a crucial step in scaling up the biohydrogen production process (Sydney et al., 2020).Recently, researchers investigated the ability of ANN to forecast total hydrogen production relying upon the volatile fatty acid generation (Sydney et al., 2020).The study inputs included time, acetate, and butyrate intensities (model 1); lactate, time, propionate, butyrate intensities, and acetate (model 2); time and the sum of all volatile fatty acid (model 3); and time, butyrate, and acetate intensities (model 4).With an R 2 greater than 0.98, all models accurately predicted either the total amount of biohydrogen produced or its rate of production, as well as the yield.The volatile fatty acid is the recommended input parameter for procedures involving pure cultures, whereas an acetate/butyrate model is preferred for methods involving complex/mixed cultures.Accumulated biohydrogen generation rate might be predicted precisely using ANN frameworks that depend on volatile fatty acid species diversity and quantity.Potential studies assessing the ability to adapt ANN to handle turbulences are required to lead the path for its utilization on a realistic scale.It is feasible to create an ANN biohydrogen forecasting tool relying upon volatile fatty acid generation and profile, mainly when instabilities to the bioprocess occur.The metabolic processes of the microorganisms and the species diversity of volatile fatty acids are essential considerations when selecting input variables for such a model's development (Sydney et al., 2020).

Biohydrogen production from organic waste
As mentioned earlier, biohydrogen can be produced from waste by dark fermentation (Balachandar et al., 2013).This process uses bacteria to break down organic material such as food waste or wastewater sludge and release hydrogen gas as a by-product (Balachandar et al., 2013).Recently, a study demonstrated the financial viability of producing biohydrogen from liquid pineapple waste (Ahmad et al., 2022).This study analyzed total production costs, annual sales, profitability analysis, and financial position of biohydrogen production through combined dark and photofermentation.The expected total profit after taxes for producing 3000 metric tonnes of biohydrogen from the fluid state of pineapple waste was 1.7 times that of total capital investment.The return on investment was 68% (Ahmad et al., 2022).Hence, biohydrogen generation from waste was shown to be economically feasible and attractive, positioning it as a critical player in achieving the circular economy.
Machine learning models can be trained on data collected from experiments on different types of organic waste.The data can consist of information such as the composition of the organic waste, the temperature and pressure used, the amount of time the process takes, and the amount of biohydrogen produced.Machine learning models can learn the best parameters to optimize the conversion process from this viewpoint.Models can also identify potential problems that could arise in the conversion process, such as the formation of byproducts or the incomplete conversion of organic waste.Using machine learning models, researchers and engineers can identify the most efficient and cost-effective methods for producing biohydrogen from organic waste.A representation of organic waste to biohydrogen using machine learning models is depicted in Figure 8.A study evaluated the biohydrogen output prediction potential from organic waste by employing ANN and support vector machine (SVM) analyses to the experimental data.Compared to ANN, SVM was discovered to be more effective at making predictions.The SVM model was estimated to have an R 2 and root mean square error of 0.98 and 0.01, correspondingly.Later, a genetic algorithm (GA) with particle swarm optimization (PSO) was combined with these models to zero in on the most effective settings for the processes at hand.While the GA and PSO similarly found the optimal parameter value, the latter was considerably quicker (Mahata et al., 2020).

The patent landscape for biohydrogen
The patent landscape is used to know the trend of biohydrogen production, in addition to aiding researchers in understanding which biohydrogen production method is feasible.Patent landscape analysis primarily depends on keywords and international patent classification (IPC).In this way, researchers can narrow down their search very accurately.This analysis is carried out stage by stage so researchers can perform it without needing expertise.In the first stage, research is carried out only with keywords, and in the next, with the combination of keywords and IPC for easy understanding.
Patent landscape analysis was carried out for the keyword -"biohydrogen" in the "English All" category; the total count popped out was 567 (as on 17 th Nov 2022); another option enabled was "single family member" and "stemming" (Fig. 9).consistent growth over the decade.The dominant IPC were C12P, C12N, and A23K, occupying the top three positions.Patent landscape analysis was also carried out for the keyword -"biohydrogen production" in the "English All" category; the total count popped out was 183 (as on 17 th Nov 2022); another option enabled was "single family member" and "stemming" (Fig. 10).
Figure 10 reveals that the United States of America is leading other countries again with a patent filing count of 77.The year-based analysis shows some fluctuations in their growth over the decade.The dominant IPC were C12P, C12N, and C12M, occupying the top three positions.Patent landscape analysis was carried out for the keyword -"biohydrogen production" in the "English All" category with IPC "C12P"; the total count popped out was 113 (as of 19 th Nov 2022); another option enabled was "single family member" and "stemming".
Figure 11 reveals that the United States of America is leading other countries with a patent filing count of 52, with the year-based analysis revealing some fluctuations in their growth over the decade.The dominant IPC were C12P, C12N, and C12M, occupying the top three positions.

Perspectives of machine learning for biohydrogen production
Machine learning is expected to play a major role in biohydrogen production.Machine learning algorithms can be used to analyze the data related to biohydrogen production and identify patterns to optimize the process.Machine learning algorithms can be used to identify the most efficient production reactors and parameters, including resistance time, space velocity, and surface area, to maximize the yield of hydrogen (Deng et al., 2021;Ganguli and Bhatt, 2023).Additionally, machine learning can be used to predict the potential yield of biohydrogen from a given set of inputs, enabling researchers to make more informed decisions about the best pathways for production.Furthermore, machine learning can be used to identify potential genetic modifications that could increase the efficiency of biohydrogen production.Overall, machine learning can be used to optimize biohydrogen production, making the process more efficient and cost-effective.Based on the above discussion, we propose different pathways to boost large-scale biohydrogen production using machine learning models, as shown in Figure 12.
Using machine learning to improve biohydrogen production can take us closer to a promising future for this green vehicle fuel.However, one of the major challenges for machine learning in biohydrogen production is the limited data availability.Due to the complexity and variability of the process, it is not easy to generate sufficient data for machine learning algorithms to learn from.Additionally, the available data is often incomplete or biased, making it difficult to train reliable models.Finally, machine learning models must be able to account for non-linear or non-monotonic relationships between input and output variables, which can be difficult to model.One solution to these challenges is to use synthetic data generated by simulation.This can provide a more comprehensive set of data to train machine learning algorithms on, as well as data that is not biased by experimental conditions.Additionally, a combination of simulation and real experimental data can help reduce the data set's bias and provide more accurate results.Finally, using advanced algorithms such as deep learning can help to account for non-linear relationships between input and output variables.

Conclusions
The rising number of scholarly articles and patent applications in biohydrogen suggests that the industry has promising growth potential in the not-too-distant future for biohydrogen replacing fossil fuels.In particular, when burned, hydrogen fuel does not produce harmful byproducts or heat.Biohydrogen production can also be implemented in developing countries by using existing bio-wastes.
While there is no single policy on biohydrogen production, many governments are determined to support developing and deploying renewable energy technologies, including biohydrogen production.One example of the policies that support biohydrogen production is the European Union's Horizon 2020 research and innovation program, which funds research and demonstration projects related to renewable energy technologies.In the United States, the Department of Energy (DOE) has programs to support research and development of renewable energy technologies.The DOE's Hydrogen and Fuel Cells Program funds research on producing, storing, and using hydrogen as a fuel.In addition, the DOE's Bioenergy Technologies Office supports research on using biomass as a feedstock for economic hydrogen production via fermentation.
The use of machine learning for biohydrogen production is still in its infancy but holds great promise.Machine learning algorithms can be used to analyze complex data sets and to identify patterns and correlations.This information can be used to develop models to predict the optimal conditions for biohydrogen production.Machine learning can also identify the most promising strains or organisms for biohydrogen production and optimize the fermentation process.In addition, machine learning can be used to develop efficient systems for controlling biohydrogen production and process control systems to ensure optimal yields.All of these applications are very promising, and with the increased use of machine learning in biohydrogen production, this process's efficiency and cost-effectiveness will likely improve significantly.
any light energy, it is good for the environment and cheap to run (Azwar et al., 2014).Strict anaerobes produce reduced ferredoxin (Fd (red)) by oxidizing pyruvate to acetyl coenzyme A (acetyl-CoA) and CO2.After exposure to oxygen, the red Fd becomes oxidized and releases hydrogen gas (Jiménez-Llanos et al., 2020; Salakkam et al., 2021).Facultative anaerobes metabolize pyruvate into acetyl-CoA and formate.The formate hydrocarbon lyase converts the formate into hydrogen in the next step.

Figure 2
depicts the overall dark fermentation route for extracting biohydrogen from biomass.

Fig. 1 .
Fig. 1.Most frequently used keywords in biohydrogen research according to Scopus database.The network visualization was developed by VOSviewer.

Fig. 3 .
Fig. 3. Photofermentative biohydrogen generation from biomass in a single-step process using purple non-sulfur photosynthetic bacteria.

Fig. 4 .
Fig. 4. Combined dark fermentation and photofermentation for biohydrogen generation from biomass by exploiting organic acid from dark fermentation in the photofermentation process.

Fig. 12 .
Fig. 12. Functions of machine learning models allowing to foster large-scale biohydrogen production.

Table 1 .
Comparison of the coverage of the present review with previously published reviews on the application of machine learning in biohydrogen production.