An adaptive k nearest neighbour method for imputation of missing traffic data based on two similarity.


  • Yang Wang Beijing Engineering Research Centre of Urban Transport Operation Guarantee, Beijing University of Technology, Beijing Author
  • Yu Xiao Beijing Engineering Research Centre of Urban Transport Operation Guarantee, Beijing University of Technology, Beijing Author
  • Jianhui Lai Beijing Engineering Research Centre of Urban Transport Operation Guarantee, Beijing University of Technology, Beijing Author
  • Yanyan Chen Beijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing Author



missing traffic data, similarity metrics, K-nearest neighbour method, stochastic characteristics


Traffic flow is one of the fundamental parameters for traffic analysis and planning. With the rapid development of intelligent transportation systems, a large number of various detectors have been deployed in urban roads and, consequently, huge amount of data relating to the traffic flow are accumulatively available now. However, the traffic flow data detected through various detectors are often degraded due to the presence of a number of missing data, which can even lead to erroneous analysis and decision if no appropriate process is carried out. To remedy this issue, great research efforts have been made and subsequently various imputation techniques have been successively proposed in recent years, among which the k nearest neighbour algorithm (kNN) has received a great popularity as it is easy to implement and impute the missing data effectively. In the work presented in this paper, we firstly analyse the stochastic effect of traffic flow, to which the suffering of the kNN algorithm can be attributed. This motivates us to make an improvement, while eliminating the requirement to predefine parameters. Such a parameter-free algorithm has been realized by introducing a new similarity metric which is combined with the conventional metric so as to avoid the parameter setting, which is often determined with the requirement of adequate domain knowledge. Unlike the conventional version of the kNN algorithm, the proposed algorithm employs the multivariate linear regression model to estimate the weights for the final output, based on a set of data, which is smoothed by a Wavelet technique. A series of experiments have been performed, based on a set of traffic flow data reported from serval different countries, to examine the adaptive determination of parameters and the smoothing effect. Additional experiments have been conducted to evaluate the competent performance for the proposed algorithm by comparing to a number of widely-used imputing algorithms.


Abbasifard, M. R., Ghahremani, B., Naderi, H., 2014. A survey on nearest neighbor search methods. Int J Comput Appl, 95(25), 39-52.

Arce, G. R., 2005. Nonlinear Signal Processing: A Statistical Approach (Wiley: New Jersey, USA).

Bae, B., Kim, H., Lim, H., et al., 2018. Missing data imputation for traffic flow speed using spatio-temporal cokriging[J]. Transportation Research Part C Emerging Technologies, 88, 124-139.

Bianchi, G., Sorrentino, R., 2007. Electronic filter simulation and design (McGraw-Hill Professional, 2st edn), 17-20.

Bhatia, N., Vandana., 2010. Survey of nearest neighbor techniques. Int. J. Comput. Sci. Inf. Secur, 8(2), 302-305.

Castro-Neto, M., Jeong, Y.-S., Jeong, M.-K., Han, L.D., 2009. Online-SVR for short-term traffic flow prediction under typical and atypical traffic conditions. Expert Syst. Appl., 36(3), 6164-6173.

Chen, J., Shao, J., 2000. Nearest neighbour imputation for survey data. J. Off. Stat., 16(2), 113-131.

Chen, X., He, Z. Sun, L., 2019. A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation. Transp. Res. C, Emerg. Technol., 98, 73-84.

Chui, C.K., 1992. An Introduction to Wavelets (Academic Press, 1st edn).

De Boor, C., 2001. A Practical Guide to Splines (Springer, Rev edn.), 207-214.

El-Dahshan, E.S.A., 2011. Genetic algorithm and wavelet hybrid scheme for ECG signal denoising. Telecommun Syst., 46, 209-215.

Esawey, M. E., Sayed, T., 2012. Neighbour corridors travel time estimation: Concept and a case study[J]. Advances in Transportation Studies, 28(28):81-96.

Ghosh, B., Basu, B., O’Mahony, M., 2007. Bayesian time-series model for short-term traffic flow forecasting. ASCE J. Transp. Eng., 133(3), 180-189.

Kim, H., Golub, G.H., Park, H., 2005. Missing value estimation methods for DNA microarrays gene expression data: local least squares imputation. Bioinformatics, 21(2), 187-198.

Li, L. C., Zhang, J., Wang, Y. G., et al., 2019. Missing Value Imputation for Traffic-Related Time Series Data Based on a Multi-View Learning Method. IEEE Transactions on Intel-ligent Transportation Systems, 20(8), 2933-2943.

Li, Y., Li, Z., Li, L. et al, 2013. Comparison on PPCA, KPPCA and MPPCA Based Missing Data Imputing for Traffic Flow. Proc. Int. Conf. Transportation Information and Safety, Wuhan, China, 1151-1156.

Li, Y., Li, Z., Li, L., 2014. Missing traffic data: Comparison of imputation methods’, IET Intell. Transp. Sy., 8(1), 51-57.

Loukopoulos, P., Sampath, S., Pilidis, P. et al, 2016. Dealing With Missing Data for Prognos-tic Purposes. Proc Conf. Prognostics and Sys-tem Health Management, Chengdu, China, 1-5.

Ma, X., Luan, S., Du, B. et al, 2017. Spatial copula model for imputing traffic flow data from remote microwave sensors. Sensors, 17(10), 2160.

Misiti, M., Misiti, Y., Oppenheim et al, 2007. Wavelets and their Applications (Wiley-ISTE, 1st edn).

Performance Measurement System (PeMS)., accessed 15 February 2017.

Portland Oregon Regional Transportation Ar-chive Listing (PORTAL)., accessed 27 September 2018.

Qu, L., Li, L., Zhang, Y. et al., 2009. PPCA-based missing data imputation for traffic flow volume: a systematical approach. IEEE T Intell. Transp, 10(3), 512-522.

Silva, H. D., Perera, A. S.,2017. Missing data imputation using Evolutionary k- Nearest neighbor algorithm for gene expression data[C]. Sixteenth International Conference on Advances in Ict for Emerging Regions.

Stoeck, T., Prajwowski, K., 2010. Application of Interval Interpolation for the Description of Compression-Ignition Engine Performance Characteristics[J]. Archives of Transport, 22(3).

Tang, J., Wang, Y., Zhang, S., et al., 2015. On Missing Traffic Data Imputation Based on Fuzzy C-Means Method by Considering Spatial-Temporal Correlation[C]. Transportation Research Board Meeting.

Tan, H., Feng, G., Feng, J. et al, 2013. A tensor-based method for missing traffic data completion. Transport Res C-Emer, 28, 15-27.

Vlahogianni, E. I., Karlaftis, M. G., Golias, J. C, 2005. Optimized and metaoptimized neural networks for short-term traffic flow prediction: a genetic approach. Transp. Res. C, Emerg. Technol., 13(3), 211-234.

Wang, S. B. Mao, G. Q., 2019. Missing Data Estimation for Traffic Volume by Searching an Optimum Closed Cut in Urban Networks. IEEE Transactions on Intelligent Transportation Systems, 20(1), 75-86.

Wang, Y., Zhang, Y., Piao, X., et al., 2019. Traffic Data Reconstruction via Adaptive Spatial-Temporal Correlations. IEEE Transactions on Intelligent Transportation Systems, 20(4), 1531-1543.

Wichura, M. J., 2006. The coordinate-free approach to linear models (Cambridge Series in Statistical and Probabilistic Mathematics; Cambridge University Press, 1st edn).

Xu, J., Li, X., Shi, H., 2010. Short-term traffic flow forecasting model under missing data. Journal of Computer Applications, 30(4), 1117-1120.

Zbilut, J. P., Marwan, N., 2008. The wiener-khinchin theorem and recurrence quantification. Phys Lett A, 372(44), 6622-6626.

Zhang, C. S., Sun, S., Yu, G., 2004. A Bayesian network approach to time series forecasting of short-term traffic flows. Proc. IEEE Conf. Intelligent Transportation Systems, Washington, D.C., 216-221.

Zhang, Y., Liu, Y., 2011. Analysis of peak and non-peak traffic forecasts using combined models. J Adv Transport, 45, 21-37.

Zhong, M., Sharma, S., Lingras, P., 2004. Genetically designed models for accurate imputations of missing traffic counts. Transp. Res. Rec., 1879(1), 71-79.

Zhuang, Y., Ke, R. Wang, Y., 2019. Innovative method for traffic data imputation based on convolutional neural network. IET Intelligent Transport Systems, 13(4), 605-613.



2020-06-30 — Updated on 2024-02-06




Original articles

How to Cite

Wang, Y., Xiao, Y., Lai, J., & Chen, Y. (2024). An adaptive k nearest neighbour method for imputation of missing traffic data based on two similarity. Archives of Transport, 54(2), 59-73. (Original work published 2024)


Most read articles by the same author(s)

1 2 3 4 5 6 7 8 9 10 > >> 

Similar Articles

1-10 of 180

You may also start an advanced similarity search for this article.