Generalization of CNNs on
Relational Reasoning with Bar Charts

Zhenxing Cui1     Lu Chen2     Yunhai Wang3     Daniel Haehn4     Yong Wang5     Hanspeter Pfister6    

1Shandong University     2Zhejiang University     3Renmin University of China     4University of Massachusetts Boston     5Nanyang Technological University     6 Harvard University

Accepted by IEEE Transactions on Visualization and Computer Graphics

Figure 1: In the position-length experiment, we utilized five types of bar charts as stimuli, each including colorized bars, axes, tick labels, and titles. The mean MLAE values produced by CNNs when performing generalization tests across eight parameters on these five types of bar charts are as follows: (a-h) Each curve illustrates how the MLAE values change as the corresponding parameter values are increased or decreased. The dotted lines indicate the MLAE value calculated for the stimuli with non-perturbed parameters. (i) Estimated MLAE values for bar charts where bar lengths are encoded by different value ranges.


Abstract:

This paper presents a systematic study of the generalization of convolutional neural networks (CNNs) and humans on relational reasoning tasks with bar charts. We first revisit previous experiments on graphical perception and update the benchmark performance of CNNs. We then test the generalization performance of CNNs on a classic relational reasoning task: estimating bar length ratios in a bar chart, by progressively perturbing the standard visualizations. We further conduct a user study to compare the performance of CNNs and humans. Our results show that CNNs outperform humans only when the training and test data have the same visual encodings. Otherwise, they may perform worse. We also find that CNNs are sensitive to perturbations in various visual encodings, regardless of their relevance to the target bars. Yet, humans are mainly influenced by bar lengths. Our study suggests that robust relational reasoning with visualizations is challenging for CNNs. Improving CNNs' generalization performance may require training them to better recognize task-related visual properties.

Source Code: github.com/Ideas-Laboratory/Graphical-Perception




Results:





Figure 1: Performance comparison of eight network architectures trained with eight sets of hyper-parameters in type 1 of the position-length experiment. The best trained model for each kind of network is highlighted and labeled with the corresponding MLAE value, while others are shaded.



Figure 2: Comparison of bar chart tasks produced by humans and CNNs: mean MLAE and 95% confidence interval values produced by humans and CNNs on five types of bar charts without and with the largest level perturbations on eight parameters.



Figure 3: Improving the generalization performance of CNNs by providing segmentation masks. (a) The blended image of a chart and its segmentation mask, in which the area of target bars is shaded; (b) A perturbed chart blended with its Grad-CAM map, where the high-intensity region is shaded; And curves showing how IoU scores (c) and MLAE values (d) change over different levels of perturbations of one of eight parameters.



Figure 4: Improving the generalization performance of CNNs by augmenting training data with different perturbations. The heatmap shows the average MLAE values for testing CNNs on the chart images with different perturbations, where columns correspond to differently trained networks and rows show different test conditions. The cell highlighted by a bold red font indicates that the testing stimuli have similar perturbations as the ones in the training stimuli.



Materials:





1 1
Paper (10.0M) Supp. (0.89M)

Acknowledgements:

This work was partially supported by the grants of the NSFC (No.62132017), the Shandong Provincial Natural Science Foundation (No.ZQ2022JQ32), the Fundamental Research Funds for the Central Universities, the Research Funds of Renmin University of China, and the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 2 (No.T2EP20222-0049).