Copyright (c) 2025-2026 jbeenenga j.beenenga@gmail.com Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated ...
The main model is composed of a pretrained convolutional encoder to extract features and a transformer decoder to generate caption. For more information, please refer to the corresponding DCASE task ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results