| Description | In the Linux kernel, the following vulnerability has been resolved:  bpf: Fix too early release of tcx_entry  Pedro Pinto and later independently also Hyunwoo Kim and Wongi Lee reported an issue that the tcx_entry can be released too early leading to a use after free (UAF) when an active old-style ingress or clsact qdisc with a shared tc block is later replaced by another ingress or clsact instance.  Essentially, the sequence to trigger the UAF (one example) can be as follows:    1. A network namespace is created   2. An ingress qdisc is created. This allocates a tcx_entry, and      &tcx_entry->miniq is stored in the qdisc's miniqp->p_miniq. At the      same time, a tcf block with index 1 is created.   3. chain0 is attached to the tcf block. chain0 must be connected to      the block linked to the ingress qdisc to later reach the function      tcf_chain0_head_change_cb_del() which triggers the UAF.   4. Create and graft a clsact qdisc. This causes the ingress qdisc      created in step 1 to be removed, thus freeing the previously linked      tcx_entry:       rtnetlink_rcv_msg()        => tc_modify_qdisc()          => qdisc_create()            => clsact_init() [a]          => qdisc_graft()            => qdisc_destroy()              => __qdisc_destroy()                => ingress_destroy() [b]                  => tcx_entry_free()                    => kfree_rcu() // tcx_entry freed    5. Finally, the network namespace is closed. This registers the      cleanup_net worker, and during the process of releasing the      remaining clsact qdisc, it accesses the tcx_entry that was      already freed in step 4, causing the UAF to occur:       cleanup_net()        => ops_exit_list()          => default_device_exit_batch()            => unregister_netdevice_many()              => unregister_netdevice_many_notify()                => dev_shutdown()                  => qdisc_put()                    => clsact_destroy() [c]                      => tcf_block_put_ext()                        => tcf_chain0_head_change_cb_del()                          => tcf_chain_head_change_item()                            => clsact_chain_head_change()                              => mini_qdisc_pair_swap() // UAF  There are also other variants, the gist is to add an ingress (or clsact) qdisc with a specific shared block, then to replace that qdisc, waiting for the tcx_entry kfree_rcu() to be executed and subsequently accessing the current active qdisc's miniq one way or another.  The correct fix is to turn the miniq_active boolean into a counter. What can be observed, at step 2 above, the counter transitions from 0->1, at step [a] from 1->2 (in order for the miniq object to remain active during the replacement), then in [b] from 2->1 and finally [c] 1->0 with the eventual release. The reference counter in general ranges from [0,2] and it does not need to be atomic since all access to the counter is protected by the rtnl mutex. With this in place, there is no longer a UAF happening and the tcx_entry is freed at the correct time. |