| Description | In the Linux kernel, the following vulnerability has been resolved:  btrfs: zoned: fix extent range end unlock in cow_file_range()  Running generic/751 on the for-next branch often results in a hang like below. They are both stack by locking an extent. This suggests someone forget to unlock an extent.    INFO: task kworker/u128:1:12 blocked for more than 323 seconds.         Not tainted 6.13.0-BTRFS-ZNS+ #503   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.   task:kworker/u128:1  state:D stack:0     pid:12    tgid:12    ppid:2      flags:0x00004000   Workqueue: btrfs-fixup btrfs_work_helper [btrfs]   Call Trace:    <TASK>    __schedule+0x534/0xdd0    schedule+0x39/0x140    __lock_extent+0x31b/0x380 [btrfs]    ? __pfx_autoremove_wake_function+0x10/0x10    btrfs_writepage_fixup_worker+0xf1/0x3a0 [btrfs]    btrfs_work_helper+0xff/0x480 [btrfs]    ? lock_release+0x178/0x2c0    process_one_work+0x1ee/0x570    ? srso_return_thunk+0x5/0x5f    worker_thread+0x1d1/0x3b0    ? __pfx_worker_thread+0x10/0x10    kthread+0x10b/0x230    ? __pfx_kthread+0x10/0x10    ret_from_fork+0x30/0x50    ? __pfx_kthread+0x10/0x10    ret_from_fork_asm+0x1a/0x30    </TASK>   INFO: task kworker/u134:0:184 blocked for more than 323 seconds.         Not tainted 6.13.0-BTRFS-ZNS+ #503   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.   task:kworker/u134:0  state:D stack:0     pid:184   tgid:184   ppid:2      flags:0x00004000   Workqueue: writeback wb_workfn (flush-btrfs-4)   Call Trace:    <TASK>    __schedule+0x534/0xdd0    schedule+0x39/0x140    __lock_extent+0x31b/0x380 [btrfs]    ? __pfx_autoremove_wake_function+0x10/0x10    find_lock_delalloc_range+0xdb/0x260 [btrfs]    writepage_delalloc+0x12f/0x500 [btrfs]    ? srso_return_thunk+0x5/0x5f    extent_write_cache_pages+0x232/0x840 [btrfs]    btrfs_writepages+0x72/0x130 [btrfs]    do_writepages+0xe7/0x260    ? srso_return_thunk+0x5/0x5f    ? lock_acquire+0xd2/0x300    ? srso_return_thunk+0x5/0x5f    ? find_held_lock+0x2b/0x80    ? wbc_attach_and_unlock_inode.part.0+0x102/0x250    ? wbc_attach_and_unlock_inode.part.0+0x102/0x250    __writeback_single_inode+0x5c/0x4b0    writeback_sb_inodes+0x22d/0x550    __writeback_inodes_wb+0x4c/0xe0    wb_writeback+0x2f6/0x3f0    wb_workfn+0x32a/0x510    process_one_work+0x1ee/0x570    ? srso_return_thunk+0x5/0x5f    worker_thread+0x1d1/0x3b0    ? __pfx_worker_thread+0x10/0x10    kthread+0x10b/0x230    ? __pfx_kthread+0x10/0x10    ret_from_fork+0x30/0x50    ? __pfx_kthread+0x10/0x10    ret_from_fork_asm+0x1a/0x30    </TASK>  This happens because we have another success path for the zoned mode. When there is no active zone available, btrfs_reserve_extent() returns -EAGAIN. In this case, we have two reactions.  (1) If the given range is never allocated, we can only wait for someone     to finish a zone, so wait on BTRFS_FS_NEED_ZONE_FINISH bit and retry     afterward.  (2) Or, if some allocations are already done, we must bail out and let     the caller to send IOs for the allocation. This is because these IOs     may be necessary to finish a zone.  The commit 06f364284794 ("btrfs: do proper folio cleanup when cow_file_range() failed") moved the unlock code from the inside of the loop to the outside. So, previously, the allocated extents are unlocked just after the allocation and so before returning from the function. However, they are no longer unlocked on the case (2) above. That caused the hang issue.  Fix the issue by modifying the 'end' to the end of the allocated range. Then, we can exit the loop and the same unlock code can properly handle the case. |