Intelligent Optimization Process Through Reinforcement Learning
Einleitung
Lighting, both natural and artificial, has become a part of our daily lives that have been taken for granted in modern times. The majority of humans need light to perform their daily activities with ease and comfort. However, light is not limited only to visualize specific objects but also to medical treatments. Multispectral illumination has the potential of changing the general perception towards lighting due to its unique features to modulate the spectral power distribution (SPD). Modulating or adjusting the produced SPD in an underdetermined condition enables manipulating the chromaticity point, light quality, and system efficiency according to different operational purposes. Research has been investigating the optimization of the spectral composition for multichannel LED luminaires to allow for the configuration of favourable light settings. Various methods, including metaheuristic optimization methods, were utilized to determine the channel intensities for the spectral light of a luminaire. This project dealt with developing and implementing a reinforcement learning-based optimization procedure for the computation of the spectral composition of multichannel LED luminaires.
Methoden
The objective was to determine the relationship between the channel setting and the selected target metrics within a dataset. Lighting optimization is an application where a user may desire more than one light property. Therefore, eight light metrics are considered in this project, with three based on visual performance, such as luminance, CIE-uʹ, and CIE-vʹ, as well as five based on the photoreceptor signals. The investigation examined two models of each approach, where the three and five objectives are optimized simultaneously. Two different reinforcement learning (RL) algorithms were investigated to choose the correct LED channel settings, which the user determines. The first RL algorithm is deterministic (DDPG), while the second is stochastic (PPO).
Ergebnisse
We found that the best reinforcement learning approach for the task is the PPO algorithm. The model takes small steps in the correct direction through the clipped surrogate function optimization, which improves the model overall and keeps it stable. Despite the successful training, the PPO model is not directly generalizable due to the small number of samples that the agent perceives. During the evaluation phase, the agent could not solve the unseen levels (desired target metrics) directly, but after 150 episodes, 84% of the levels were solved. The DDPG model has the lowest performance and makes false predictions even after relocating the added noise from the actions to the network parameters. DDPG could not comprehend the state and action space in a 15-channel environment; therefore, we reduced the number of channels to 6. Although reducing the number of channels (actions) should have simplified the DDPG algorithm's environment, it has not led to better predictions. Consequently, the DDPG algorithm was not able to solve the task.
Diskussion
The optimisation process's complexity increases with the number of LEDs used to adjust a spectrum with a chromaticity point along the Planckian locus. In general, the multichannel LED luminaire presents a considerably more challenging learning problem primarily because of its large number of possible action (channel) combinations. It is likely to exploit repetitions in the task specification and get stuck in local optima. Learning results also tend to be sensitive to the particular algorithm, exploration strategy, reward function, termination condition, and weight initialisation. Additional studies must be carried out since intelligent lighting can improve humans' well-being and poses several challenging problems starting from defining objectives to devising performance metrics.