Researchers from China recently tested a hybrid model that combines a convolutional neural network and a gated recurrent unit to identify aggressive behaviour of group-housed pigs. They also integrated a spatio-temporal attention mechanism in the model to improve the effect of aggressive behaviour recognition.
The convolutional neural network acted as a spatial feature extractor to learn appearance representations of behaviour in each individual video frame. The gated recurrent unit network functioned as a temporal feature extractor to learn motion representations of behaviour in a behaviour episode.
The researchers defined aggressive behaviour of group-housed pigs as biting, knocking, trampling and chasing. They defined mounting, playing, lying, feeding, and drinking as non-aggressive behaviour. The team observed 541 video recordings of aggressive behaviour with a duration of 3 seconds, and 565 video recordings of non-aggressive behaviour with a duration of 3 seconds.
The researchers integrated the video frames to the model. Spatial feature extractors, integrated with a spatial attention mechanism extracted behavourial representations from each individual frame. The role of the spatial attention model is to improve the feature expression of local areas, thereby enhancing the feature expression of the area where aggressive behaviour occurs and weakening the feature expression of irrelevant areas. The temporal feature extractor integrating with a temporal attention mechanism extracted motion representations of behaviour from spatial features over different frames. The team identified aggressive and non-aggressive behaviour based on these features.
The baseline model had the lowest recognition accuracy. However, adding the spatial attention mechanism improved the recognition accuracy in the test dataset by 2.2% and optimised the spatial feature extraction. Adding temporal attention improved the performance of the model compared to the baseline model, but it did not optimise the spatial feature extractor part. Integrating both spatial and temporal attention mechanisms to the model increased the recognition accuracy to 94.8%, proving that the spatio-temporal attention model is valid.
The authors concluded that the variant model that combines spatial and temporal attention mechanisms in a hybrid model of convolutional neural network and gated recurrent unit has the best recognition performance, which proves the effectiveness of the model in this paper.
However, further research is required to clearly capture information such as the body parts used by pigs with aggressive behaviour with a multi-camera data collection method.