Welcome to Ziqiang Shi（石自强）’s homepage.

About

I currently serve as a Senior Research Scientist and Senior Research Manager at the Fujitsu R&D Center in Beijing. My research interests encompass music, audio, speech, image, and multimodal signal processing. My work has been recognized with several competitive awards, including a Best Paper Award at the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024), First Place in the 7th AI City Challenge (2023), and Third Place in the 5th Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE 2019). I received my Ph.D. in Computer Science from the Harbin Institute of Technology in Spring 2013, under the supervision of Prof. Jiqing Han. Previously, I earned my M.S. in Computer Science from the same institution in Summer 2008, where I worked with Prof. Haifeng Li. I hold a B.S. in Computer Science from Northeastern University in Shenyang, where I was admitted via a special exemption from the National College Entrance Examination (Gaokao). Last, but certainly not least, I got married to my wonderful wife Lei Shi in the summer of 2012. (By the way, I am originally from the suburbs of Yangzhou City, Jiangsu province)

Artwork :)

Not mine, my daughter’s, enjoy (https://shiziqiang.github.io/paintings_by_anna/).

Publications

(Note: Most of my papers can be found on arxiv.)

专著

韩纪庆，石自强. 声学事件检测理论与方法[M]. 科学出版社，2016. （购买链接：http://item.jd.com/10563712295.html）

Preprints

Ziqiang Shi, Rujie Liu. Generative Modelling with High-Order Langevin Dynamics. https://arxiv.org/abs/2404.12814
Ziqiang Shi, Rujie Liu, Jiqing Han. LaFurca: Iterative Refined Speech Separation Based on Context-Aware Dual-Path Parallel Bi-LSTM. 2020. https://arxiv.org/abs/2001.08998 (achieved 20.55dB SDR improvement, 20.35dB SI-SDR improvement, 3.69 of PESQ, and 94.86% of ESTOI on WSJ-2mix dataset. You can check the separated voices in this page:https://shiziqiang.github.io/tastas/).
Ziqiang Shi, Tieran Zheng, Jiqing Han. Identifiability of multivariate logistic mixture models. arxiv.org/abs/1208.3546.

Journal Papers

Liwen Zhang, Jiqing Han, Ziqiang Shi. Learning Temporal Relations from Semantic Neighbors for Acoustic Scene Classification. IEEE Signal Processing Letters. 2020.
Liwen Zhang, Ziqiang Shi, Jiqing Han. Pyramidal Temporal Pooling with Discriminative Mapping for Audio Classification. IEEE/ACM Trans. on Audio, Speech and Language Processing, 2020, DOI:10.1109/TASLP.2020.2966868
Ziqiang Shi, Jiqing Han, Tieran Zheng. Soft Margin Based Low-rank Audio Signal Classification. Neural Processing Letters, 2014, DOI:10.1007/s11063-014-9357-6.
Ziqiang Shi，Jiqing Han，Tieran Zheng，Shiwen Deng. Audio Segment Classification Using Online Learning Based Tensor Representation Feature Discrimination[J]. IEEE Transactions on Audio, Speech, And Language Processing, 2013, 21(1): 184-194.
Ziqiang Shi，Jiqing Han，Tieran Zheng，Ji Li. Identification of Objectionable Audio Segments Based on Pseudo and Heterogeneous Mixture Models[J]. IEEE Transactions on Audio, Speech, And Language Processing, 2013, 21 (3): 611-623.
Ziqiang Shi，Jiqing Han，Tieran Zheng. Audio Classification with Low-rank Matrix Representation Features[J]. ACM Transactions on Intelligent Systems and Technology, 2014, 5 (1).
Ziqiang Shi，Tieran Zheng，Jiqing Han，Boyang Gao. Erotic Audio Recognition Using Heterogeneous Ensemble Classifiers[J]. International Journal of Computer and Electrical Engineering, vol. 4, no. 5, pp. 666-669 , 2012.
Shi Ziqiang, Gao Boyang, Zheng Tieran, Han Jiqing, ``Study On The Recognition Of Objectionable Audio’’, International Journal of Pattern Recognition and Artificial Intelligence, 2010,24(6):981-994.
石自强，李海峰，孙佳音，``基于SVM的流行音乐中人声的识别’’, 计算机工程与应用, 2008 44(25): 126-128.

Conference Papers

Ziqiang Shi, Rujie Liu, Jun Takahashi, and Shan Jiang. TrueCount: Improving Open-World Object Counting with Visual-Language Models and Dynamic Multi-Modal Inputs. In Proceedings of the 33rd ACM International Conference on Multimedia (MM ‘25@Dublin, Ireland), 1764–1773.
Shijie Nie, Ziqiang Shi, Rujie Liu, Song Guo, Meng Zhang, Mengjiao Wang, Kazuki Osamura, Lina Septiana, Abe Narishige. Attribute Conditional Diffusion-Augmented Person Re-Identification. ICASSP 2025.
Shi, Ziqiang and Liu, Rujie and Takahashi, Jun and Yamamoto, Takuma. Bayesian Optimal Latent Projection for Noisy Image Restoration. WACV 2025.
Liu, Zhongling, Ziqiang Shi, Rujie Liu, Liu Liu, Takuma Yamamoto, and Daisuke Uchida. “Self-Checkout Product Detection with Occlusion Layer Prediction and Intersection Weighting.” In 2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1742-1747. IEEE, 2024.
Ziqiang Shi, Rujie Liu. MULTIMEDIA GENERATIVE MODELLING WITH HIGH-ORDER LANGEVIN DYNAMICS. ICME 2024.
Ziqiang Shi, Rujie Liu. LANGWAVE: REALISTIC VOICE GENERATION BASED ON HIGH-ORDER LANGEVIN DYNAMICS. ICASSP 2024.
Ziqiang Shi, Rujie Liu. NOISY IMAGE RESTORATION BASED ON CONDITIONAL ACCELERATION SCORE APPROXIMATION. ICASSP 2024.
Ziqiang Shi, Rujie Liu. Conditional Velocity Score Estimation for Image Restoration. WACV 2024. (Best paper award)
Ziqiang Shi, Zhongling Liu, Liu Liu, Rujie Liu, Takuma Yamamoto, Xiaoyu Mi, and Daisuke Uchida. CheckSORT: Refined synthetic data combination and optimized sort for automatic retail checkout. In CVPR Workshop, 2023. (1st prize in the 7th AI CITY CHALLENGE)
Zhongling Liu, Rujie Liu, Ziqiang Shi, Liu Liu Xiaoyu Mi, Kentaro Murase. SEMI-SUPERVISED CONTRASTIVE LEARNING WITH SOFT MASK ATTENTION FOR FACIAL ACTION UNIT DETECTION. ICASSP 2023.
Shoule Wu, Ziqiang Shi. ItoWave: Ito Stochastic Differential Equation Is All You Need For Wave Generation. ICASSP 2022. https://arxiv.org/abs/2201.12519
Zhongling Liu, Ziqiang Shi, Rujie Liu, Liu Liu, Xiaoyu Mi, Kentaro Murase. Expression-assisted facial action unit detection through an attention mechanism and smooth class-weighted Loss. Thirteenth International Conference on Signal Processing Systems (ICSPS 2021)
Ziqiang Shi, Liu Liu, Zhongling Liu, Rujie Liu, Xiaoyu Mi, Murase Kentaro. HiCOMEX: Facial action unit recognition based on hierarchy intensity distribution and COMEX relation learning. 2021 4th International Conference on Intelligent Robotics and Control Engineering, IRCE 2021. (Best oral presentation)
Ziqiang Shi, Rujie Liu, Jiqing Han. Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss. Interspeech 2020. https://arxiv.org/abs/2008.03149
Liwen Zhang, Jiqing Han, Ziqiang Shi. ATReSN-Net: Capturing Attentive Temporal Relations in Semantic Neighborhood for Acoustic Scene Classification. Interspeech 2020.
Shi Ziqiang, Liu Liu, Liu Rujie. HODGE AND PODGE: HYBRID SUPERVISED SOUND EVENT DETECTION WITH MULTI-HOT MIXMATCH AND COMPOSITION CONSISTENCE TRAINING. EUSIPCO 2020. https://arxiv.org/abs/2002.06021
Liwen Zhang, Ziqiang Shi, et al. FurcaNeXt: End-to-end monaural speech separation with dynamic gated dilated temporal convolutional networks. MMM 2020.
Ziqiang Shi, et al. HODGEPODGE: SOUND EVENT DETECTION BASED ON ENSEMBLE OF SEMI-SUPERVISED LEARNING METHODS. DCASE2019 workshop. arxiv.org/abs/1907.07398. (Ranked 3rd in the dcase2019 challenge task4: “sound event detection in domestic environments”.)
Ziqiang Shi, et al. FurcaNet: An end-to-end deep gated convolutional, long short-term memory, deep neural networks for single channel speech separation. 2019 全国人机语音通讯学术会议. arxiv.org/abs/1902.00651
Ziqiang Shi, et al. Is CQT more suitable for monaural speech separation than STFT? an empirical study. 2019 全国人机语音通讯学术会议. arxiv.org/abs/1902.00631.
Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Jiqing Han, and Anyan Shi. Deep Attention Gated Dilated Temporal Convolutional Networks with Intra-Parallel Convolutional Modules for End-to-End Monaural Speech Separation [C]. Interspeech 2019.
Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Shoji Hayakawa, Shouji Harada and Jiqing Han. End-to-End Monaural Speech Separation with Multi-Scale Dynamic Weighted Gated Dilated Convolutional Pyramid Network [C]. Interspeech 2019.
Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Shoji Hayakawa and Jiqing Han. FurcaX: End-to-end monaural speech separation based on deep gated (de)convolutional neural networks with adversarial example training [C]. ICASSP 2019.
Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Shoji Hayakawa and Jiqing Han. Deep Clustering With Constant Q Transform For Multi-Talker Single Channel Speech Separation [C]. IEEE FRUCT 2018.
Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu. Double Joint Bayesian Modeling of DNN LocalI-Vector for Text Dependent Speaker Veriﬁcation with Random Digit Strings [C]. Interspeech 2018.
Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu. Latent Factor Analysis of Deep Bottleneck Features for Speaker Veriﬁcation with Random Digit Strings [C]. Interspeech 2018.
Ziqiang Shi, Liu Liu, Huibin Lin, Rujie Liu. Joint Learning of J-Vector Extractor and Joint Bayesian Model for Text Dependent Speaker Veriﬁcation [C]. Interspeech 2018.
Ziqiang Shi, Mengjiao Wang, Liu Liu, Huibin Lin, Rujie Liu. A Double Joint Bayesian Approach for J-Vector Based Text-dependent Speaker Verification[C]. Speaker Odyssey 2018.
Ziqiang Shi, Rujie Liu. A better convergence analysis of the block coordinate descent method for large scale machine learning[C]. ICMLA 2017.
Ziqiang Shi, Liu Liu, Mengjiao Wang, Rujie Liu. Multi-View (Joint) Probability Linear Discrimination Analysis For J-Vector Based Text Dependent Speaker Verification[C]. ASRU 2017.
Ziqiang Shi, Rujie Liu. Online and stochastic Douglas-Rachford splitting method for large scale machine learning[C]. ACML workshop on Learning on big data 2016.
Ziqiang Shi, Rujie Liu. Empirical study of PROXTONE and PROXTONE + for Fast Learning of Large Scale Sparse Models[C]. ICSP 2016.
Ziqiang Shi, Rujie Liu. Large Scale Optimization with Proximal Stochastic Newton-type Gradient Descent [C]. ECML 2015. (Acceptance rate: 89/383=23%)
Ziqiang Shi, Rujie Liu. Online and Stochastic Universal Gradient Methods for Minimizing Regularized H”older Continuous Finite Sums in Machine Learning[C]. PAKDD 2015. (Acceptance rate: 90/405=22%)
Ziqiang Shi，Tieran Zheng， Jiqing Han， Ji Li. Guarantees of Augmented Trace Norm Models in Tensor Recovery[C]. IJCAI 2013. (Acceptance rate: 413/1473=28%)
Ziqiang Shi，Tieran Zheng， Jiqing Han， Shiwen Deng. Low-rank Audio Signal Classification Under Soft Margin and Trace Norm Constraints[C]. Interspeech2012, pp.2401-2404.
石自强, 韩纪庆, 郑铁然, ``基于锚空间的音频场景识别’’, 2011 全国人机语音通讯学术会议.
Ziqiang Shi, Jiqing Han, Tieran Zheng, “A Novel Framework Based on Trace Norm Minimization for Audio Event Detection”, ICONIP 2011, Part II, LNCS 7063, pp. 646-654. Springer, Heidelberg (2011).
Ziqiang Shi, Jiqing Han, Tieran Zheng, ``Heterogeneous Mixture Models Using Sparse Representation Features For Applause And Laugh Detection’’, IEEE International Workshop on Machine Learning For Signal Processing (MLSP), pp.1-5, 2011.
Ziqiang Shi, Jiqing Han, Tieran Zheng, ``Real-World Speech/Non-Speech Audio Classification Based on Sparse Representation Features and GPCs’’, Interspeech2011,pp.2401-2404.
Miao Li, Jin Li, Jiqing Han, Ziqiang Shi, ``Singing Melody Extraction from Pop Songs Using a Novel Feature and Viterbi Search’’, IEEE International Conference on Computational Intelligence and Software Engineering (CiSE), pp.1-4, 2010.
Jin Li, Jiqing Han, Ziqiang Shi, ``An Efficient Approach to Humming Transcription for Query-by-Humming System’’, IEEE International Congress on Image and Signal Processing (CISP), pp.3746-3749, 2010.
Hao Xue, HaiFeng Li, Chang Gao, Ziqiang Shi, ``Computationally Efficient Audio Segmentation through a Multi-Stage BIC Approach’’, IEEE International Conference on Image and Signal Processing (CISP), pp.3774-3777, 2010.
Ziqiang Shi, Boyang Gao, Tieran Zheng, Jiqing Han, ``Objectionable Audio Content Understanding Based On In-Class Clustering Method’’, IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC2009), pp. 712-716, 2009.
Ziqiang Shi, Boyang Gao, Jiqing Han, Zhen Wu, ``Study of Objectionable Sound Recognition based on Histogram Features and SVM’’, IEEE International Conference on Image and Signal Processing (CISP), pp. 1-4, 2009.

PhD Thesis

ROBUST ACOUSTIC EVENT DETECTION BASED ON LONG-TERM FEATURES ( 基于长时特征的鲁棒声学事件检测 ).

审稿

IEEE Signal Processing Letter, Applied Acoustics, Speech Communication，Multimedia Tools and Applications, IEEE Transactions on Audio, Speech, And Language Processing，自动化学报，电子学报，ECML，AAAI，IJCAI，WACV。

荣誉

最佳论文奖 2024年IEEE/CVF 计算机视觉应用冬季会议(IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2024)
冠军第七届人工智能城市挑战赛(AI CITY CHALLENGE 2023, https://www.aicitychallenge.org/2023-challenge-winners/)
副研究员 (2021)
北京市朝阳区国际高端商务人才之青年英才（2019）
季军第五届声学场景和事件的检测和分类竞赛(DCASE 2019, https://dcase.community/challenge2019/)
富士通研发中心总经理特别奖之信息处理技术的本地化推广及应用奖（2015）
富士通研发中心总经理特别奖之团队突出贡献奖（2014）
哈尔滨工业大学优秀博士论文提名（2014，3/42 计算机学院）

Contact me

TEL: +86-13621160486

E-mails: shiziqiang7@gmail.com and shiziqiang@fujitsu.com.

Blog：http://blog.sciencenet.cn/u/Riemann7.