| Description | In the Linux kernel, the following vulnerability has been resolved:  RDMA/cma: Fix workqueue crash in cma_netevent_work_handler  struct rdma_cm_id has member "struct work_struct net_work" that is reused for enqueuing cma_netevent_work_handler()s onto cma_wq.  Below crash[1] can occur if more than one call to cma_netevent_callback() occurs in quick succession, which further enqueues cma_netevent_work_handler()s for the same rdma_cm_id, overwriting any previously queued work-item(s) that was just scheduled to run i.e. there is no guarantee the queued work item may run between two successive calls to cma_netevent_callback() and the 2nd INIT_WORK would overwrite the 1st work item (for the same rdma_cm_id), despite grabbing id_table_lock during enqueue.  Also drgn analysis [2] indicates the work item was likely overwritten.  Fix this by moving the INIT_WORK() to __rdma_create_id(), so that it doesn't race with any existing queue_work() or its worker thread.  [1] Trimmed crash stack: ============================================= BUG: kernel NULL pointer dereference, address: 0000000000000008 kworker/u256:6 ... 6.12.0-0... Workqueue:  cma_netevent_work_handler [rdma_cm] (rdma_cm) RIP: 0010:process_one_work+0xba/0x31a Call Trace:  worker_thread+0x266/0x3a0  kthread+0xcf/0x100  ret_from_fork+0x31/0x50  ret_from_fork_asm+0x1a/0x30 =============================================  [2] drgn crash analysis:  >>> trace = prog.crashed_thread().stack_trace() >>> trace (0)  crash_setup_regs (./arch/x86/include/asm/kexec.h:111:15) (1)  __crash_kexec (kernel/crash_core.c:122:4) (2)  panic (kernel/panic.c:399:3) (3)  oops_end (arch/x86/kernel/dumpstack.c:382:3) ... (8)  process_one_work (kernel/workqueue.c:3168:2) (9)  process_scheduled_works (kernel/workqueue.c:3310:3) (10) worker_thread (kernel/workqueue.c:3391:4) (11) kthread (kernel/kthread.c:389:9)  Line workqueue.c:3168 for this kernel version is in process_one_work(): 3168	strscpy(worker->desc, pwq->wq->name, WORKER_DESC_LEN);  >>> trace[8]["work"] *(struct work_struct *)0xffff92577d0a21d8 = { 	.data = (atomic_long_t){ 		.counter = (s64)536870912,    <=== Note 	}, 	.entry = (struct list_head){ 		.next = (struct list_head *)0xffff924d075924c0, 		.prev = (struct list_head *)0xffff924d075924c0, 	}, 	.func = (work_func_t)cma_netevent_work_handler+0x0 = 0xffffffffc2cec280, }  Suspicion is that pwq is NULL: >>> trace[8]["pwq"] (struct pool_workqueue *)<absent>  In process_one_work(), pwq is assigned from: struct pool_workqueue *pwq = get_work_pwq(work);  and get_work_pwq() is: static struct pool_workqueue *get_work_pwq(struct work_struct *work) {  	unsigned long data = atomic_long_read(&work->data);   	if (data & WORK_STRUCT_PWQ)  		return work_struct_pwq(data);  	else  		return NULL; }  WORK_STRUCT_PWQ is 0x4: >>> print(repr(prog['WORK_STRUCT_PWQ'])) Object(prog, 'enum work_flags', value=4)  But work->data is 536870912 which is 0x20000000. So, get_work_pwq() returns NULL and we crash in process_one_work(): 3168	strscpy(worker->desc, pwq->wq->name, WORKER_DESC_LEN); ============================================= |