* Re: uring regression - lost write request [not found] <CABVffENnJ8JkP7EtuUTqi+VkJDBFU37w1UXe4Q3cB7-ixxh0VA@mail.gmail.com> @ 2021-10-22 9:10 ` Pavel Begunkov 2021-10-25 9:57 ` Pavel Begunkov 0 siblings, 1 reply; 35+ messages in thread From: Pavel Begunkov @ 2021-10-22 9:10 UTC (permalink / raw) To: Daniel Black, linux-block; +Cc: io-uring On 10/22/21 04:12, Daniel Black wrote: > Sometime after 5.11 and is fixed in 5.15-rcX (rc6 extensively tested > over last few days) is a kernel regression we are tracing in > https://jira.mariadb.org/browse/MDEV-26674 and > https://jira.mariadb.org/browse/MDEV-26555 > 5.10 and early across many distros and hardware appear not to have a problem. > > I'd appreciate some help identifying a 5.14 linux stable patch > suitable as I observe the fault in mainline 5.14.14 (built Cc: [email protected] Let me try to remember anything relevant from 5.15, Thanks for letting know -- Pavel Begunkov ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-10-22 9:10 ` uring regression - lost write request Pavel Begunkov @ 2021-10-25 9:57 ` Pavel Begunkov 2021-10-25 11:09 ` Daniel Black 0 siblings, 1 reply; 35+ messages in thread From: Pavel Begunkov @ 2021-10-25 9:57 UTC (permalink / raw) To: Daniel Black, linux-block; +Cc: io-uring On 10/22/21 10:10, Pavel Begunkov wrote: > On 10/22/21 04:12, Daniel Black wrote: >> Sometime after 5.11 and is fixed in 5.15-rcX (rc6 extensively tested >> over last few days) is a kernel regression we are tracing in >> https://jira.mariadb.org/browse/MDEV-26674 and >> https://jira.mariadb.org/browse/MDEV-26555 >> 5.10 and early across many distros and hardware appear not to have a problem. >> >> I'd appreciate some help identifying a 5.14 linux stable patch >> suitable as I observe the fault in mainline 5.14.14 (built > > Cc: [email protected] > > Let me try to remember anything relevant from 5.15, > Thanks for letting know Daniel, following the links I found this: "From: Daniel Black <[email protected]> ... The good news is I've validated that the linux mainline 5.14.14 build from https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.14.14/ has actually fixed this problem." To be clear, is the mainline 5.14 kernel affected with the issue? Or does the problem exists only in debian/etc. kernel trees? -- Pavel Begunkov ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-10-25 9:57 ` Pavel Begunkov @ 2021-10-25 11:09 ` Daniel Black 2021-10-25 11:25 ` Pavel Begunkov 0 siblings, 1 reply; 35+ messages in thread From: Daniel Black @ 2021-10-25 11:09 UTC (permalink / raw) To: Pavel Begunkov; +Cc: linux-block, io-uring On Mon, Oct 25, 2021 at 8:59 PM Pavel Begunkov <[email protected]> wrote: > > On 10/22/21 10:10, Pavel Begunkov wrote: > > On 10/22/21 04:12, Daniel Black wrote: > >> Sometime after 5.11 and is fixed in 5.15-rcX (rc6 extensively tested > >> over last few days) is a kernel regression we are tracing in > >> https://jira.mariadb.org/browse/MDEV-26674 and > >> https://jira.mariadb.org/browse/MDEV-26555 > >> 5.10 and early across many distros and hardware appear not to have a problem. > >> > >> I'd appreciate some help identifying a 5.14 linux stable patch > >> suitable as I observe the fault in mainline 5.14.14 (built > > > > Cc: [email protected] > > > > Let me try to remember anything relevant from 5.15, > > Thanks for letting know > > Daniel, following the links I found this: > > "From: Daniel Black <[email protected]> > ... > The good news is I've validated that the linux mainline 5.14.14 build > from https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.14.14/ has > actually fixed this problem." > > To be clear, is the mainline 5.14 kernel affected with the issue? > Or does the problem exists only in debian/etc. kernel trees? > > -- > Pavel Begunkov Thanks Pavel for looking. I'm retesting https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.14.14/ in earnest. I did get some assertions, but they may have been unrelated. The testing continues... The problem with debian trees on 5.14.12 (as linux-image-5.14.0-3-amd64_5.14.12-1_amd64.deb) was quite real https://jira.mariadb.org/browse/MDEV-26674?focusedCommentId=203155&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-203155 What is concrete is the fc34 package of 5.14.14 (which obviously does have a Red Hat delta https://src.fedoraproject.org/rpms/kernel/blob/f34/f/patch-5.14-redhat.patch), but unsure of significance. Output below: https://koji.fedoraproject.org/koji/buildinfo?buildID=1847210 $ uname -a Linux localhost.localdomain 5.14.14-200.fc34.x86_64 #1 SMP Wed Oct 20 16:15:12 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux ~/repos/mariadb-server-10.6 10.6 $ bd ~/repos/build-mariadb-server-10.6 $ mysql-test/mtr --parallel=4 encryption.innochecksum{,,,,,} Logging: /home/dan/repos/mariadb-server-10.6/mysql-test/mariadb-test-run.pl --parallel=4 encryption.innochecksum encryption.innochecksum encryption.innochecksum encryption.innochecksum encryption.innochecksum encryption.innochecksum vardir: /home/dan/repos/build-mariadb-server-10.6/mysql-test/var Removing old var directory... - WARNING: Using the 'mysql-test/var' symlink The destination for symlink /home/dan/repos/build-mariadb-server-10.6/mysql-test/var does not exist; Removing it and creating a new var directory Creating var directory '/home/dan/repos/build-mariadb-server-10.6/mysql-test/var'... Checking supported features... MariaDB Version 10.6.5-MariaDB - SSL connections supported - binaries built with wsrep patch Collecting tests... Installing system database... ============================================================================== TEST WORKER RESULT TIME (ms) or COMMENT -------------------------------------------------------------------------- worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019 worker[3] Using MTR_BUILD_THREAD 302, with reserved ports 16040..16059 worker[2] Using MTR_BUILD_THREAD 301, with reserved ports 16020..16039 worker[4] Using MTR_BUILD_THREAD 303, with reserved ports 16060..16079 encryption.innochecksum '16k,cbc,innodb,strict_crc32' w3 [ pass ] 5460 encryption.innochecksum '16k,cbc,innodb,strict_crc32' w2 [ pass ] 5418 encryption.innochecksum '16k,cbc,innodb,strict_crc32' w1 [ pass ] 9391 encryption.innochecksum '16k,cbc,innodb,strict_crc32' w3 [ pass ] 8682 encryption.innochecksum '16k,cbc,innodb,strict_crc32' w3 [ pass ] 3873 encryption.innochecksum '8k,cbc,innodb,strict_crc32' w1 [ pass ] 9133 encryption.innochecksum '4k,cbc,innodb,strict_crc32' w2 [ pass ] 11074 encryption.innochecksum '8k,cbc,innodb,strict_crc32' w1 [ pass ] 5253 encryption.innochecksum '16k,cbc,innodb,strict_full_crc32' w3 [ pass ] 4019 encryption.innochecksum '4k,cbc,innodb,strict_crc32' w2 [ pass ] 6318 encryption.innochecksum '16k,cbc,innodb,strict_full_crc32' w3 [ pass ] 6176 encryption.innochecksum '8k,cbc,innodb,strict_crc32' w1 [ pass ] 7305 encryption.innochecksum '16k,cbc,innodb,strict_full_crc32' w3 [ pass ] 4430 encryption.innochecksum '4k,cbc,innodb,strict_crc32' w2 [ pass ] 10005 encryption.innochecksum '8k,cbc,innodb,strict_crc32' w1 [ pass ] 6878 encryption.innochecksum '16k,cbc,innodb,strict_full_crc32' w3 [ pass ] 3613 encryption.innochecksum '16k,cbc,innodb,strict_full_crc32' w3 [ pass ] 3875 encryption.innochecksum '4k,cbc,innodb,strict_crc32' w2 [ pass ] 6612 encryption.innochecksum '8k,cbc,innodb,strict_crc32' w1 [ pass ] 4901 encryption.innochecksum '16k,cbc,innodb,strict_full_crc32' w3 [ pass ] 3853 encryption.innochecksum '8k,cbc,innodb,strict_crc32' w1 [ pass ] 5080 encryption.innochecksum '4k,cbc,innodb,strict_crc32' w2 [ pass ] 7072 encryption.innochecksum '4k,cbc,innodb,strict_crc32' w2 [ pass ] 6774 encryption.innochecksum '4k,cbc,innodb,strict_full_crc32' w3 [ pass ] 7037 encryption.innochecksum '8k,cbc,innodb,strict_full_crc32' w1 [ pass ] 4961 encryption.innochecksum '8k,cbc,innodb,strict_full_crc32' w1 [ pass ] 5692 encryption.innochecksum '4k,cbc,innodb,strict_full_crc32' w3 [ pass ] 8449 encryption.innochecksum '16k,ctr,innodb,strict_crc32' w2 [ pass ] 5515 encryption.innochecksum '8k,cbc,innodb,strict_full_crc32' w1 [ pass ] 5650 encryption.innochecksum '16k,ctr,innodb,strict_crc32' w2 [ pass ] 3722 encryption.innochecksum '4k,cbc,innodb,strict_full_crc32' w3 [ pass ] 6691 encryption.innochecksum '8k,cbc,innodb,strict_full_crc32' w1 [ pass ] 4611 encryption.innochecksum '16k,ctr,innodb,strict_crc32' w2 [ pass ] 4587 encryption.innochecksum '16k,ctr,innodb,strict_crc32' w2 [ pass ] 5465 encryption.innochecksum '8k,cbc,innodb,strict_full_crc32' w1 [ pass ] 6900 encryption.innochecksum '4k,cbc,innodb,strict_full_crc32' w3 [ pass ] 8333 encryption.innochecksum '16k,ctr,innodb,strict_crc32' w2 [ pass ] 4691 encryption.innochecksum '8k,cbc,innodb,strict_full_crc32' w1 [ pass ] 5077 encryption.innochecksum '4k,cbc,innodb,strict_full_crc32' w3 [ pass ] 6319 encryption.innochecksum '16k,ctr,innodb,strict_crc32' w2 [ pass ] 4590 encryption.innochecksum '4k,ctr,innodb,strict_crc32' w1 [ pass ] 9683 encryption.innochecksum '8k,ctr,innodb,strict_crc32' w2 [ pass ] 5404 encryption.innochecksum '4k,ctr,innodb,strict_crc32' w1 [ pass ] 6775 encryption.innochecksum '8k,ctr,innodb,strict_crc32' w2 [ pass ] 6190 encryption.innochecksum '4k,ctr,innodb,strict_crc32' w1 [ pass ] 9354 encryption.innochecksum '8k,ctr,innodb,strict_crc32' w2 [ pass ] 7734 encryption.innochecksum '8k,ctr,innodb,strict_crc32' w2 [ pass ] 4993 encryption.innochecksum '4k,ctr,innodb,strict_crc32' w1 [ pass ] 6280 encryption.innochecksum '8k,ctr,innodb,strict_crc32' w2 [ pass ] 4487 encryption.innochecksum '4k,ctr,innodb,strict_crc32' w1 [ pass ] 6971 encryption.innochecksum '8k,ctr,innodb,strict_crc32' w2 [ pass ] 5172 encryption.innochecksum '4k,ctr,innodb,strict_crc32' w1 [ pass ] 6317 encryption.innochecksum '16k,ctr,innodb,strict_full_crc32' w2 [ pass ] 3371 encryption.innochecksum '16k,ctr,innodb,strict_full_crc32' w2 [ pass ] 3472 encryption.innochecksum '16k,ctr,innodb,strict_full_crc32' w2 [ pass ] 6707 encryption.innochecksum '4k,ctr,innodb,strict_full_crc32' w1 [ pass ] 9337 encryption.innochecksum '16k,ctr,innodb,strict_full_crc32' w2 [ pass ] 9176 encryption.innochecksum '4k,ctr,innodb,strict_full_crc32' w1 [ pass ] 11817 encryption.innochecksum '16k,ctr,innodb,strict_full_crc32' w2 [ pass ] 3419 encryption.innochecksum '16k,ctr,innodb,strict_full_crc32' w2 [ pass ] 5256 encryption.innochecksum '4k,ctr,innodb,strict_full_crc32' w1 [ pass ] 9291 encryption.innochecksum '4k,ctr,innodb,strict_full_crc32' w1 [ pass ] 6508 encryption.innochecksum '4k,ctr,innodb,strict_full_crc32' w2 [ pass ] 6294 encryption.innochecksum '4k,ctr,innodb,strict_full_crc32' w1 [ pass ] 6327 encryption.innochecksum '8k,ctr,innodb,strict_full_crc32' w2 [ pass ] 4579 encryption.innochecksum '8k,ctr,innodb,strict_full_crc32' w1 [ pass ] 4764 encryption.innochecksum '8k,ctr,innodb,strict_full_crc32' w2 [ pass ] 4469 encryption.innochecksum '8k,ctr,innodb,strict_full_crc32' w1 [ pass ] 4677 encryption.innochecksum '8k,ctr,innodb,strict_full_crc32' w2 [ pass ] 4696 encryption.innochecksum '8k,ctr,innodb,strict_full_crc32' w1 [ pass ] 3898 encryption.innochecksum '4k,cbc,innodb,strict_full_crc32' w3 [ pass ] 127358 encryption.innochecksum '16k,cbc,innodb,strict_crc32' w4 [ fail ] Test ended at 2021-10-25 21:39:13 CURRENT_TEST: encryption.innochecksum mysqltest: At line 41: query 'INSERT INTO t3 SELECT * FROM t1' failed: <Unknown> (2013): Lost connection to server during query The result from queries just before the failure was: SET GLOBAL innodb_file_per_table = ON; set global innodb_compression_algorithm = 1; # Create and populate a tables CREATE TABLE t1 (a INT AUTO_INCREMENT PRIMARY KEY, b TEXT) ENGINE=InnoDB ENCRYPTED=YES ENCRYPTION_KEY_ID=4; CREATE TABLE t2 (a INT AUTO_INCREMENT PRIMARY KEY, b TEXT) ENGINE=InnoDB ROW_FORMAT=COMPRESSED ENCRYPTED=YES ENCRYPTION_KEY_ID=4; CREATE TABLE t3 (a INT AUTO_INCREMENT PRIMARY KEY, b TEXT) ENGINE=InnoDB ROW_FORMAT=COMPRESSED ENCRYPTED=NO; CREATE TABLE t4 (a INT AUTO_INCREMENT PRIMARY KEY, b TEXT) ENGINE=InnoDB PAGE_COMPRESSED=1; CREATE TABLE t5 (a INT AUTO_INCREMENT PRIMARY KEY, b TEXT) ENGINE=InnoDB PAGE_COMPRESSED=1 ENCRYPTED=YES ENCRYPTION_KEY_ID=4; CREATE TABLE t6 (a INT AUTO_INCREMENT PRIMARY KEY, b TEXT) ENGINE=InnoDB; Server [mysqld.1 - pid: 15380, winpid: 15380, exit: 256] failed during test run Server log from this test: ----------SERVER LOG START----------- $ /home/dan/repos/build-mariadb-server-10.6/sql/mariadbd --defaults-group-suffix=.1 --defaults-file=/home/dan/repos/build-mariadb-server-10.6/mysql-test/var/4/my.cnf --log-output=file --innodb-page-size=16K --skip-innodb-read-only-compressed --innodb-checksum-algorithm=strict_crc32 --innodb-flush-sync=OFF --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --plugin-load-add=file_key_management.so --loose-file-key-management --loose-file-key-management-filename=/home/dan/repos/mariadb-server-10.6/mysql-test/std_data/keys.txt --file-key-management-encryption-algorithm=aes_cbc --skip-innodb-read-only-compressed --core-file --loose-debug-sync-timeout=300 2021-10-25 21:28:56 0 [Note] /home/dan/repos/build-mariadb-server-10.6/sql/mariadbd (server 10.6.5-MariaDB-log) starting as process 15381 ... 2021-10-25 21:28:56 0 [Warning] Could not increase number of max_open_files to more than 1024 (request: 32190) 2021-10-25 21:28:56 0 [Warning] Changed limits: max_open_files: 1024 max_connections: 151 (was 151) table_cache: 421 (was 2000) 2021-10-25 21:28:56 0 [Note] Plugin 'partition' is disabled. 2021-10-25 21:28:56 0 [Note] Plugin 'SEQUENCE' is disabled. 2021-10-25 21:28:56 0 [Note] InnoDB: Compressed tables use zlib 1.2.11 2021-10-25 21:28:56 0 [Note] InnoDB: Number of pools: 1 2021-10-25 21:28:56 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions 2021-10-25 21:28:56 0 [Note] InnoDB: Using liburing 2021-10-25 21:28:56 0 [Note] InnoDB: Initializing buffer pool, total size = 8388608, chunk size = 8388608 2021-10-25 21:28:56 0 [Note] InnoDB: Completed initialization of buffer pool 2021-10-25 21:28:56 0 [Note] InnoDB: 128 rollback segments are active. 2021-10-25 21:28:56 0 [Note] InnoDB: Creating shared tablespace for temporary tables 2021-10-25 21:28:56 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2021-10-25 21:28:56 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB. 2021-10-25 21:28:56 0 [Note] InnoDB: 10.6.5 started; log sequence number 43637; transaction id 17 2021-10-25 21:28:56 0 [Note] InnoDB: Loading buffer pool(s) from /home/dan/repos/build-mariadb-server-10.6/mysql-test/var/4/mysqld.1/data/ib_buffer_pool 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_FT_CONFIG' is disabled. 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_SYS_TABLESTATS' is disabled. 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_FT_DELETED' is disabled. 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_CMP' is disabled. 2021-10-25 21:28:56 0 [Note] Plugin 'THREAD_POOL_WAITS' is disabled. 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_CMP_RESET' is disabled. 2021-10-25 21:28:56 0 [Note] Plugin 'THREAD_POOL_QUEUES' is disabled. 2021-10-25 21:28:56 0 [Note] Plugin 'FEEDBACK' is disabled. 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_FT_INDEX_TABLE' is disabled. 2021-10-25 21:28:56 0 [Note] Plugin 'THREAD_POOL_GROUPS' is disabled. 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_CMP_PER_INDEX_RESET' is disabled. 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_FT_INDEX_CACHE' is disabled. 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_FT_BEING_DELETED' is disabled. 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_CMPMEM_RESET' is disabled. 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_FT_DEFAULT_STOPWORD' is disabled. 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_SYS_TABLESPACES' is disabled. 2021-10-25 21:28:56 0 [Note] Plugin 'user_variables' is disabled. 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_TABLESPACES_ENCRYPTION' is disabled. 2021-10-25 21:28:56 0 [Note] Plugin 'THREAD_POOL_STATS' is disabled. 2021-10-25 21:28:56 0 [Note] Plugin 'unix_socket' is disabled. 2021-10-25 21:28:56 0 [Warning] /home/dan/repos/build-mariadb-server-10.6/sql/mariadbd: unknown variable 'loose-feedback-debug-startup-interval=20' 2021-10-25 21:28:56 0 [Warning] /home/dan/repos/build-mariadb-server-10.6/sql/mariadbd: unknown variable 'loose-feedback-debug-first-interval=60' 2021-10-25 21:28:56 0 [Warning] /home/dan/repos/build-mariadb-server-10.6/sql/mariadbd: unknown variable 'loose-feedback-debug-interval=60' 2021-10-25 21:28:56 0 [Warning] /home/dan/repos/build-mariadb-server-10.6/sql/mariadbd: unknown option '--loose-pam-debug' 2021-10-25 21:28:56 0 [Warning] /home/dan/repos/build-mariadb-server-10.6/sql/mariadbd: unknown option '--loose-aria' 2021-10-25 21:28:56 0 [Warning] /home/dan/repos/build-mariadb-server-10.6/sql/mariadbd: unknown variable 'loose-debug-sync-timeout=300' 2021-10-25 21:28:56 0 [Note] Server socket created on IP: '127.0.0.1'. 2021-10-25 21:28:56 0 [Note] /home/dan/repos/build-mariadb-server-10.6/sql/mariadbd: ready for connections. Version: '10.6.5-MariaDB-log' socket: '/home/dan/repos/build-mariadb-server-10.6/mysql-test/var/tmp/4/mysqld.1.sock' port: 16060 Source distribution 2021-10-25 21:28:56 0 [Note] InnoDB: Buffer pool(s) load completed at 211025 21:28:56 2021-10-25 21:39:11 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-10-25 11:09 ` Daniel Black @ 2021-10-25 11:25 ` Pavel Begunkov 2021-10-30 7:30 ` Salvatore Bonaccorso 0 siblings, 1 reply; 35+ messages in thread From: Pavel Begunkov @ 2021-10-25 11:25 UTC (permalink / raw) To: Daniel Black; +Cc: linux-block, io-uring On 10/25/21 12:09, Daniel Black wrote: > On Mon, Oct 25, 2021 at 8:59 PM Pavel Begunkov <[email protected]> wrote: >> >> On 10/22/21 10:10, Pavel Begunkov wrote: >>> On 10/22/21 04:12, Daniel Black wrote: >>>> Sometime after 5.11 and is fixed in 5.15-rcX (rc6 extensively tested >>>> over last few days) is a kernel regression we are tracing in >>>> https://jira.mariadb.org/browse/MDEV-26674 and >>>> https://jira.mariadb.org/browse/MDEV-26555 >>>> 5.10 and early across many distros and hardware appear not to have a problem. >>>> >>>> I'd appreciate some help identifying a 5.14 linux stable patch >>>> suitable as I observe the fault in mainline 5.14.14 (built >>> >>> Cc: [email protected] >>> >>> Let me try to remember anything relevant from 5.15, >>> Thanks for letting know >> >> Daniel, following the links I found this: >> >> "From: Daniel Black <[email protected]> >> ... >> The good news is I've validated that the linux mainline 5.14.14 build >> from https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.14.14/ has >> actually fixed this problem." >> >> To be clear, is the mainline 5.14 kernel affected with the issue? >> Or does the problem exists only in debian/etc. kernel trees? > > Thanks Pavel for looking. > > I'm retesting https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.14.14/ > in earnest. I did get some assertions, but they may have been > unrelated. The testing continues... Thanks for the work on pinpointing it. I'll wait for your conclusion then, it'll give us an idea what we should look for. > The problem with debian trees on 5.14.12 (as > linux-image-5.14.0-3-amd64_5.14.12-1_amd64.deb) was quite real > https://jira.mariadb.org/browse/MDEV-26674?focusedCommentId=203155&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-203155 > > > What is concrete is the fc34 package of 5.14.14 (which obviously does > have a Red Hat delta > https://src.fedoraproject.org/rpms/kernel/blob/f34/f/patch-5.14-redhat.patch), > but unsure of significance. Output below: > > https://koji.fedoraproject.org/koji/buildinfo?buildID=1847210 > > $ uname -a > Linux localhost.localdomain 5.14.14-200.fc34.x86_64 #1 SMP Wed Oct 20 > 16:15:12 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux > > ~/repos/mariadb-server-10.6 10.6 > $ bd > > ~/repos/build-mariadb-server-10.6 > $ mysql-test/mtr --parallel=4 encryption.innochecksum{,,,,,} > Logging: /home/dan/repos/mariadb-server-10.6/mysql-test/mariadb-test-run.pl > --parallel=4 encryption.innochecksum encryption.innochecksum > encryption.innochecksum encryption.innochecksum > encryption.innochecksum encryption.innochecksum > vardir: /home/dan/repos/build-mariadb-server-10.6/mysql-test/var > Removing old var directory... > - WARNING: Using the 'mysql-test/var' symlink > The destination for symlink > /home/dan/repos/build-mariadb-server-10.6/mysql-test/var does not > exist; Removing it and creating a new var directory > Creating var directory > '/home/dan/repos/build-mariadb-server-10.6/mysql-test/var'... > Checking supported features... > MariaDB Version 10.6.5-MariaDB > - SSL connections supported > - binaries built with wsrep patch > Collecting tests... > Installing system database... > > ============================================================================== > > TEST WORKER RESULT TIME (ms) or COMMENT > -------------------------------------------------------------------------- > > worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019 > worker[3] Using MTR_BUILD_THREAD 302, with reserved ports 16040..16059 > worker[2] Using MTR_BUILD_THREAD 301, with reserved ports 16020..16039 > worker[4] Using MTR_BUILD_THREAD 303, with reserved ports 16060..16079 > encryption.innochecksum '16k,cbc,innodb,strict_crc32' w3 [ pass ] 5460 > encryption.innochecksum '16k,cbc,innodb,strict_crc32' w2 [ pass ] 5418 > encryption.innochecksum '16k,cbc,innodb,strict_crc32' w1 [ pass ] 9391 > encryption.innochecksum '16k,cbc,innodb,strict_crc32' w3 [ pass ] 8682 > encryption.innochecksum '16k,cbc,innodb,strict_crc32' w3 [ pass ] 3873 > encryption.innochecksum '8k,cbc,innodb,strict_crc32' w1 [ pass ] 9133 > encryption.innochecksum '4k,cbc,innodb,strict_crc32' w2 [ pass ] 11074 > encryption.innochecksum '8k,cbc,innodb,strict_crc32' w1 [ pass ] 5253 > encryption.innochecksum '16k,cbc,innodb,strict_full_crc32' w3 [ pass ] 4019 > encryption.innochecksum '4k,cbc,innodb,strict_crc32' w2 [ pass ] 6318 > encryption.innochecksum '16k,cbc,innodb,strict_full_crc32' w3 [ pass ] 6176 > encryption.innochecksum '8k,cbc,innodb,strict_crc32' w1 [ pass ] 7305 > encryption.innochecksum '16k,cbc,innodb,strict_full_crc32' w3 [ pass ] 4430 > encryption.innochecksum '4k,cbc,innodb,strict_crc32' w2 [ pass ] 10005 > encryption.innochecksum '8k,cbc,innodb,strict_crc32' w1 [ pass ] 6878 > encryption.innochecksum '16k,cbc,innodb,strict_full_crc32' w3 [ pass ] 3613 > encryption.innochecksum '16k,cbc,innodb,strict_full_crc32' w3 [ pass ] 3875 > encryption.innochecksum '4k,cbc,innodb,strict_crc32' w2 [ pass ] 6612 > encryption.innochecksum '8k,cbc,innodb,strict_crc32' w1 [ pass ] 4901 > encryption.innochecksum '16k,cbc,innodb,strict_full_crc32' w3 [ pass ] 3853 > encryption.innochecksum '8k,cbc,innodb,strict_crc32' w1 [ pass ] 5080 > encryption.innochecksum '4k,cbc,innodb,strict_crc32' w2 [ pass ] 7072 > encryption.innochecksum '4k,cbc,innodb,strict_crc32' w2 [ pass ] 6774 > encryption.innochecksum '4k,cbc,innodb,strict_full_crc32' w3 [ pass ] 7037 > encryption.innochecksum '8k,cbc,innodb,strict_full_crc32' w1 [ pass ] 4961 > encryption.innochecksum '8k,cbc,innodb,strict_full_crc32' w1 [ pass ] 5692 > encryption.innochecksum '4k,cbc,innodb,strict_full_crc32' w3 [ pass ] 8449 > encryption.innochecksum '16k,ctr,innodb,strict_crc32' w2 [ pass ] 5515 > encryption.innochecksum '8k,cbc,innodb,strict_full_crc32' w1 [ pass ] 5650 > encryption.innochecksum '16k,ctr,innodb,strict_crc32' w2 [ pass ] 3722 > encryption.innochecksum '4k,cbc,innodb,strict_full_crc32' w3 [ pass ] 6691 > encryption.innochecksum '8k,cbc,innodb,strict_full_crc32' w1 [ pass ] 4611 > encryption.innochecksum '16k,ctr,innodb,strict_crc32' w2 [ pass ] 4587 > encryption.innochecksum '16k,ctr,innodb,strict_crc32' w2 [ pass ] 5465 > encryption.innochecksum '8k,cbc,innodb,strict_full_crc32' w1 [ pass ] 6900 > encryption.innochecksum '4k,cbc,innodb,strict_full_crc32' w3 [ pass ] 8333 > encryption.innochecksum '16k,ctr,innodb,strict_crc32' w2 [ pass ] 4691 > encryption.innochecksum '8k,cbc,innodb,strict_full_crc32' w1 [ pass ] 5077 > encryption.innochecksum '4k,cbc,innodb,strict_full_crc32' w3 [ pass ] 6319 > encryption.innochecksum '16k,ctr,innodb,strict_crc32' w2 [ pass ] 4590 > encryption.innochecksum '4k,ctr,innodb,strict_crc32' w1 [ pass ] 9683 > encryption.innochecksum '8k,ctr,innodb,strict_crc32' w2 [ pass ] 5404 > encryption.innochecksum '4k,ctr,innodb,strict_crc32' w1 [ pass ] 6775 > encryption.innochecksum '8k,ctr,innodb,strict_crc32' w2 [ pass ] 6190 > encryption.innochecksum '4k,ctr,innodb,strict_crc32' w1 [ pass ] 9354 > encryption.innochecksum '8k,ctr,innodb,strict_crc32' w2 [ pass ] 7734 > encryption.innochecksum '8k,ctr,innodb,strict_crc32' w2 [ pass ] 4993 > encryption.innochecksum '4k,ctr,innodb,strict_crc32' w1 [ pass ] 6280 > encryption.innochecksum '8k,ctr,innodb,strict_crc32' w2 [ pass ] 4487 > encryption.innochecksum '4k,ctr,innodb,strict_crc32' w1 [ pass ] 6971 > encryption.innochecksum '8k,ctr,innodb,strict_crc32' w2 [ pass ] 5172 > encryption.innochecksum '4k,ctr,innodb,strict_crc32' w1 [ pass ] 6317 > encryption.innochecksum '16k,ctr,innodb,strict_full_crc32' w2 [ pass ] 3371 > encryption.innochecksum '16k,ctr,innodb,strict_full_crc32' w2 [ pass ] 3472 > encryption.innochecksum '16k,ctr,innodb,strict_full_crc32' w2 [ pass ] 6707 > encryption.innochecksum '4k,ctr,innodb,strict_full_crc32' w1 [ pass ] 9337 > encryption.innochecksum '16k,ctr,innodb,strict_full_crc32' w2 [ pass ] 9176 > encryption.innochecksum '4k,ctr,innodb,strict_full_crc32' w1 [ pass ] 11817 > encryption.innochecksum '16k,ctr,innodb,strict_full_crc32' w2 [ pass ] 3419 > encryption.innochecksum '16k,ctr,innodb,strict_full_crc32' w2 [ pass ] 5256 > encryption.innochecksum '4k,ctr,innodb,strict_full_crc32' w1 [ pass ] 9291 > encryption.innochecksum '4k,ctr,innodb,strict_full_crc32' w1 [ pass ] 6508 > encryption.innochecksum '4k,ctr,innodb,strict_full_crc32' w2 [ pass ] 6294 > encryption.innochecksum '4k,ctr,innodb,strict_full_crc32' w1 [ pass ] 6327 > encryption.innochecksum '8k,ctr,innodb,strict_full_crc32' w2 [ pass ] 4579 > encryption.innochecksum '8k,ctr,innodb,strict_full_crc32' w1 [ pass ] 4764 > encryption.innochecksum '8k,ctr,innodb,strict_full_crc32' w2 [ pass ] 4469 > encryption.innochecksum '8k,ctr,innodb,strict_full_crc32' w1 [ pass ] 4677 > encryption.innochecksum '8k,ctr,innodb,strict_full_crc32' w2 [ pass ] 4696 > encryption.innochecksum '8k,ctr,innodb,strict_full_crc32' w1 [ pass ] 3898 > encryption.innochecksum '4k,cbc,innodb,strict_full_crc32' w3 [ pass ] 127358 > encryption.innochecksum '16k,cbc,innodb,strict_crc32' w4 [ fail ] > Test ended at 2021-10-25 21:39:13 > > CURRENT_TEST: encryption.innochecksum > mysqltest: At line 41: query 'INSERT INTO t3 SELECT * FROM t1' failed: > <Unknown> (2013): Lost connection to server during query > > The result from queries just before the failure was: > SET GLOBAL innodb_file_per_table = ON; > set global innodb_compression_algorithm = 1; > # Create and populate a tables > CREATE TABLE t1 (a INT AUTO_INCREMENT PRIMARY KEY, b TEXT) > ENGINE=InnoDB ENCRYPTED=YES ENCRYPTION_KEY_ID=4; > CREATE TABLE t2 (a INT AUTO_INCREMENT PRIMARY KEY, b TEXT) > ENGINE=InnoDB ROW_FORMAT=COMPRESSED ENCRYPTED=YES ENCRYPTION_KEY_ID=4; > CREATE TABLE t3 (a INT AUTO_INCREMENT PRIMARY KEY, b TEXT) > ENGINE=InnoDB ROW_FORMAT=COMPRESSED ENCRYPTED=NO; > CREATE TABLE t4 (a INT AUTO_INCREMENT PRIMARY KEY, b TEXT) > ENGINE=InnoDB PAGE_COMPRESSED=1; > CREATE TABLE t5 (a INT AUTO_INCREMENT PRIMARY KEY, b TEXT) > ENGINE=InnoDB PAGE_COMPRESSED=1 ENCRYPTED=YES ENCRYPTION_KEY_ID=4; > CREATE TABLE t6 (a INT AUTO_INCREMENT PRIMARY KEY, b TEXT) ENGINE=InnoDB; > > > Server [mysqld.1 - pid: 15380, winpid: 15380, exit: 256] failed during test run > Server log from this test: > ----------SERVER LOG START----------- > $ /home/dan/repos/build-mariadb-server-10.6/sql/mariadbd > --defaults-group-suffix=.1 > --defaults-file=/home/dan/repos/build-mariadb-server-10.6/mysql-test/var/4/my.cnf > --log-output=file --innodb-page-size=16K > --skip-innodb-read-only-compressed > --innodb-checksum-algorithm=strict_crc32 --innodb-flush-sync=OFF > --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx > --innodb-locks --innodb-lock-waits --innodb-metrics > --innodb-buffer-pool-stats --innodb-buffer-page > --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields > --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes > --innodb-sys-tables --innodb-sys-virtual > --plugin-load-add=file_key_management.so --loose-file-key-management > --loose-file-key-management-filename=/home/dan/repos/mariadb-server-10.6/mysql-test/std_data/keys.txt > --file-key-management-encryption-algorithm=aes_cbc > --skip-innodb-read-only-compressed --core-file > --loose-debug-sync-timeout=300 > 2021-10-25 21:28:56 0 [Note] > /home/dan/repos/build-mariadb-server-10.6/sql/mariadbd (server > 10.6.5-MariaDB-log) starting as process 15381 ... > 2021-10-25 21:28:56 0 [Warning] Could not increase number of > max_open_files to more than 1024 (request: 32190) > 2021-10-25 21:28:56 0 [Warning] Changed limits: max_open_files: 1024 > max_connections: 151 (was 151) table_cache: 421 (was 2000) > 2021-10-25 21:28:56 0 [Note] Plugin 'partition' is disabled. > 2021-10-25 21:28:56 0 [Note] Plugin 'SEQUENCE' is disabled. > 2021-10-25 21:28:56 0 [Note] InnoDB: Compressed tables use zlib 1.2.11 > 2021-10-25 21:28:56 0 [Note] InnoDB: Number of pools: 1 > 2021-10-25 21:28:56 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions > 2021-10-25 21:28:56 0 [Note] InnoDB: Using liburing > 2021-10-25 21:28:56 0 [Note] InnoDB: Initializing buffer pool, total > size = 8388608, chunk size = 8388608 > 2021-10-25 21:28:56 0 [Note] InnoDB: Completed initialization of buffer pool > 2021-10-25 21:28:56 0 [Note] InnoDB: 128 rollback segments are active. > 2021-10-25 21:28:56 0 [Note] InnoDB: Creating shared tablespace for > temporary tables > 2021-10-25 21:28:56 0 [Note] InnoDB: Setting file './ibtmp1' size to > 12 MB. Physically writing the file full; Please wait ... > 2021-10-25 21:28:56 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB. > 2021-10-25 21:28:56 0 [Note] InnoDB: 10.6.5 started; log sequence > number 43637; transaction id 17 > 2021-10-25 21:28:56 0 [Note] InnoDB: Loading buffer pool(s) from > /home/dan/repos/build-mariadb-server-10.6/mysql-test/var/4/mysqld.1/data/ib_buffer_pool > 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_FT_CONFIG' is disabled. > 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_SYS_TABLESTATS' is disabled. > 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_FT_DELETED' is disabled. > 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_CMP' is disabled. > 2021-10-25 21:28:56 0 [Note] Plugin 'THREAD_POOL_WAITS' is disabled. > 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_CMP_RESET' is disabled. > 2021-10-25 21:28:56 0 [Note] Plugin 'THREAD_POOL_QUEUES' is disabled. > 2021-10-25 21:28:56 0 [Note] Plugin 'FEEDBACK' is disabled. > 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_FT_INDEX_TABLE' is disabled. > 2021-10-25 21:28:56 0 [Note] Plugin 'THREAD_POOL_GROUPS' is disabled. > 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_CMP_PER_INDEX_RESET' is disabled. > 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_FT_INDEX_CACHE' is disabled. > 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_FT_BEING_DELETED' is disabled. > 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_CMPMEM_RESET' is disabled. > 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_FT_DEFAULT_STOPWORD' is disabled. > 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_SYS_TABLESPACES' is disabled. > 2021-10-25 21:28:56 0 [Note] Plugin 'user_variables' is disabled. > 2021-10-25 21:28:56 0 [Note] Plugin 'INNODB_TABLESPACES_ENCRYPTION' is disabled. > 2021-10-25 21:28:56 0 [Note] Plugin 'THREAD_POOL_STATS' is disabled. > 2021-10-25 21:28:56 0 [Note] Plugin 'unix_socket' is disabled. > 2021-10-25 21:28:56 0 [Warning] > /home/dan/repos/build-mariadb-server-10.6/sql/mariadbd: unknown > variable 'loose-feedback-debug-startup-interval=20' > 2021-10-25 21:28:56 0 [Warning] > /home/dan/repos/build-mariadb-server-10.6/sql/mariadbd: unknown > variable 'loose-feedback-debug-first-interval=60' > 2021-10-25 21:28:56 0 [Warning] > /home/dan/repos/build-mariadb-server-10.6/sql/mariadbd: unknown > variable 'loose-feedback-debug-interval=60' > 2021-10-25 21:28:56 0 [Warning] > /home/dan/repos/build-mariadb-server-10.6/sql/mariadbd: unknown option > '--loose-pam-debug' > 2021-10-25 21:28:56 0 [Warning] > /home/dan/repos/build-mariadb-server-10.6/sql/mariadbd: unknown option > '--loose-aria' > 2021-10-25 21:28:56 0 [Warning] > /home/dan/repos/build-mariadb-server-10.6/sql/mariadbd: unknown > variable 'loose-debug-sync-timeout=300' > 2021-10-25 21:28:56 0 [Note] Server socket created on IP: '127.0.0.1'. > 2021-10-25 21:28:56 0 [Note] > /home/dan/repos/build-mariadb-server-10.6/sql/mariadbd: ready for > connections. > Version: '10.6.5-MariaDB-log' socket: > '/home/dan/repos/build-mariadb-server-10.6/mysql-test/var/tmp/4/mysqld.1.sock' > port: 16060 Source distribution > 2021-10-25 21:28:56 0 [Note] InnoDB: Buffer pool(s) load completed at > 211025 21:28:56 > 2021-10-25 21:39:11 0 [ERROR] [FATAL] InnoDB: > innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-10-25 11:25 ` Pavel Begunkov @ 2021-10-30 7:30 ` Salvatore Bonaccorso 2021-11-01 7:28 ` Daniel Black 0 siblings, 1 reply; 35+ messages in thread From: Salvatore Bonaccorso @ 2021-10-30 7:30 UTC (permalink / raw) To: Pavel Begunkov; +Cc: Daniel Black, linux-block, io-uring Hi Daniel, On Mon, Oct 25, 2021 at 12:25:01PM +0100, Pavel Begunkov wrote: > On 10/25/21 12:09, Daniel Black wrote: > > On Mon, Oct 25, 2021 at 8:59 PM Pavel Begunkov <[email protected]> wrote: > > > > > > On 10/22/21 10:10, Pavel Begunkov wrote: > > > > On 10/22/21 04:12, Daniel Black wrote: > > > > > Sometime after 5.11 and is fixed in 5.15-rcX (rc6 extensively tested > > > > > over last few days) is a kernel regression we are tracing in > > > > > https://jira.mariadb.org/browse/MDEV-26674 and > > > > > https://jira.mariadb.org/browse/MDEV-26555 > > > > > 5.10 and early across many distros and hardware appear not to have a problem. > > > > > > > > > > I'd appreciate some help identifying a 5.14 linux stable patch > > > > > suitable as I observe the fault in mainline 5.14.14 (built > > > > > > > > Cc: [email protected] > > > > > > > > Let me try to remember anything relevant from 5.15, > > > > Thanks for letting know > > > > > > Daniel, following the links I found this: > > > > > > "From: Daniel Black <[email protected]> > > > ... > > > The good news is I've validated that the linux mainline 5.14.14 build > > > from https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.14.14/ has > > > actually fixed this problem." > > > > > > To be clear, is the mainline 5.14 kernel affected with the issue? > > > Or does the problem exists only in debian/etc. kernel trees? > > > > Thanks Pavel for looking. > > > > I'm retesting https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.14.14/ > > in earnest. I did get some assertions, but they may have been > > unrelated. The testing continues... > > Thanks for the work on pinpointing it. I'll wait for your conclusion > then, it'll give us an idea what we should look for. Were you able to pinpoint the issue? Regards, Salvatore ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-10-30 7:30 ` Salvatore Bonaccorso @ 2021-11-01 7:28 ` Daniel Black 2021-11-09 22:58 ` Daniel Black 0 siblings, 1 reply; 35+ messages in thread From: Daniel Black @ 2021-11-01 7:28 UTC (permalink / raw) To: Salvatore Bonaccorso; +Cc: Pavel Begunkov, linux-block, io-uring [-- Attachment #1: Type: text/plain, Size: 1393 bytes --] On Sat, Oct 30, 2021 at 6:30 PM Salvatore Bonaccorso <[email protected]> wrote: > > > I'm retesting https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.14.14/ > > > in earnest. I did get some assertions, but they may have been > > > unrelated. The testing continues... > > > > Thanks for the work on pinpointing it. I'll wait for your conclusion > > then, it'll give us an idea what we should look for. > > Were you able to pinpoint the issue? Retesting on the ubuntu mainline 5.14.14 and 5.14.15 was unable to reproduce the issue in a VM. Using Fedora (34) 5.14.14 and 5.14.15 kernel I am reasonably able to reproduce this, and it is now reported as https://bugzilla.redhat.com/show_bug.cgi?id=2018882. I've so far been unable to reproduce this issue on 5.15.0-rc7 inside a (Ubuntu-21.10) VM. Marko did using a other heavy flushing sysbench script (modified version attached - slightly lower specs, and can be used on distro install) was able to see the fault (qps goes to 0) using Debian sid userspace and 5.15-rc6/5.15-rc7 Ubuntu mainline kernels. https://jira.mariadb.org/browse/MDEV-26674?focusedCommentId=203645&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-203645 Note if using a mariadb-10.6.5 (not quite released), there's a change of defaults to avoid this bug, mtr options --mysqld=--innodb_use_native_aio=1 --nowarnings will test this however. [-- Attachment #2: Mariarebench-MDEV-23855.sh --] [-- Type: application/x-shellscript, Size: 3163 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-01 7:28 ` Daniel Black @ 2021-11-09 22:58 ` Daniel Black 2021-11-09 23:24 ` Jens Axboe 0 siblings, 1 reply; 35+ messages in thread From: Daniel Black @ 2021-11-09 22:58 UTC (permalink / raw) To: Salvatore Bonaccorso; +Cc: Pavel Begunkov, linux-block, io-uring > On Sat, Oct 30, 2021 at 6:30 PM Salvatore Bonaccorso <[email protected]> wrote: > > Were you able to pinpoint the issue? While I have been unable to reproduce this on a single cpu, Marko can repeat a stall on a dual Broadwell chipset on kernels: * 5.15.1 - https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15.1 * 5.14.16 - https://packages.debian.org/sid/linux-image-5.14.0-4-amd64 Detailed observations: https://jira.mariadb.org/browse/MDEV-26674 The previous script has been adapted to use MariaDB-10.6 package and sysbench to demonstrate a workload, I've changed Marko's script to work with the distro packages and use innodb_use_native_aio=1. MariaDB packages: https://mariadb.org/download/?t=repo-config (needs a distro that has liburing userspace libraries as standard support) Script: https://jira.mariadb.org/secure/attachment/60358/Mariabench-MDEV-26674-io_uring-1 The state is achieved either when the sysbench prepare stalls, or the tps printed at 5 second intervals falls to 0. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-09 22:58 ` Daniel Black @ 2021-11-09 23:24 ` Jens Axboe 2021-11-10 18:01 ` Jens Axboe 0 siblings, 1 reply; 35+ messages in thread From: Jens Axboe @ 2021-11-09 23:24 UTC (permalink / raw) To: Daniel Black, Salvatore Bonaccorso; +Cc: Pavel Begunkov, linux-block, io-uring On 11/9/21 3:58 PM, Daniel Black wrote: >> On Sat, Oct 30, 2021 at 6:30 PM Salvatore Bonaccorso <[email protected]> wrote: >>> Were you able to pinpoint the issue? > > While I have been unable to reproduce this on a single cpu, Marko can > repeat a stall on a dual Broadwell chipset on kernels: > > * 5.15.1 - https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15.1 > * 5.14.16 - https://packages.debian.org/sid/linux-image-5.14.0-4-amd64 > > Detailed observations: > https://jira.mariadb.org/browse/MDEV-26674 > > The previous script has been adapted to use MariaDB-10.6 package and > sysbench to demonstrate a workload, I've changed Marko's script to > work with the distro packages and use innodb_use_native_aio=1. > > MariaDB packages: > > https://mariadb.org/download/?t=repo-config > (needs a distro that has liburing userspace libraries as standard support) > > Script: > > https://jira.mariadb.org/secure/attachment/60358/Mariabench-MDEV-26674-io_uring-1 > > The state is achieved either when the sysbench prepare stalls, or the > tps printed at 5 second intervals falls to 0. Thanks, this is most useful! I'll take a look at this. -- Jens Axboe ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-09 23:24 ` Jens Axboe @ 2021-11-10 18:01 ` Jens Axboe 2021-11-11 6:52 ` Daniel Black 0 siblings, 1 reply; 35+ messages in thread From: Jens Axboe @ 2021-11-10 18:01 UTC (permalink / raw) To: Daniel Black, Salvatore Bonaccorso; +Cc: Pavel Begunkov, linux-block, io-uring On 11/9/21 4:24 PM, Jens Axboe wrote: > On 11/9/21 3:58 PM, Daniel Black wrote: >>> On Sat, Oct 30, 2021 at 6:30 PM Salvatore Bonaccorso <[email protected]> wrote: >>>> Were you able to pinpoint the issue? >> >> While I have been unable to reproduce this on a single cpu, Marko can >> repeat a stall on a dual Broadwell chipset on kernels: >> >> * 5.15.1 - https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15.1 >> * 5.14.16 - https://packages.debian.org/sid/linux-image-5.14.0-4-amd64 >> >> Detailed observations: >> https://jira.mariadb.org/browse/MDEV-26674 >> >> The previous script has been adapted to use MariaDB-10.6 package and >> sysbench to demonstrate a workload, I've changed Marko's script to >> work with the distro packages and use innodb_use_native_aio=1. >> >> MariaDB packages: >> >> https://mariadb.org/download/?t=repo-config >> (needs a distro that has liburing userspace libraries as standard support) >> >> Script: >> >> https://jira.mariadb.org/secure/attachment/60358/Mariabench-MDEV-26674-io_uring-1 >> >> The state is achieved either when the sysbench prepare stalls, or the >> tps printed at 5 second intervals falls to 0. > > Thanks, this is most useful! I'll take a look at this. Would it be possible to turn this into a full reproducer script? Something that someone that knows nothing about mysqld/mariadb can just run and have it reproduce. If I install the 10.6 packages from above, then it doesn't seem to use io_uring or be linked against liburing. The script also seems to assume that various things are setup appropriately, like SRCTREE, MDIR, etc. -- Jens Axboe ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-10 18:01 ` Jens Axboe @ 2021-11-11 6:52 ` Daniel Black 2021-11-11 14:30 ` Jens Axboe 0 siblings, 1 reply; 35+ messages in thread From: Daniel Black @ 2021-11-11 6:52 UTC (permalink / raw) To: Jens Axboe; +Cc: Salvatore Bonaccorso, Pavel Begunkov, linux-block, io-uring > Would it be possible to turn this into a full reproducer script? > Something that someone that knows nothing about mysqld/mariadb can just > run and have it reproduce. If I install the 10.6 packages from above, > then it doesn't seem to use io_uring or be linked against liburing. Sorry Jens. Hope containers are ok. mkdir ~/mdbtest/ $ podman run -d -e MARIADB_ALLOW_EMPTY_ROOT_PASSWORD=1 -e MARIADB_USER=sbtest -e MARIADB_PASSWORD=sbtest -e MARIADB_DATABASE=sbtest --name mdb10.6-uring_test -v $HOME/mdbtest:/var/lib/mysql:Z --security-opt seccomp=unconfined quay.io/danielgblack/mariadb-test:10.6-impish-sysbench --innodb_log_file_size=1G --innodb_buffer_pool_size=50G --innodb_io_capacity=5000 --innodb_io_capacity_max=9000 --innodb_flush_log_at_trx_commit=0 --innodb_adaptive_flushing_lwm=0 --innodb-adaptive-flushing=1 --innodb_flush_neighbors=1 --innodb-use-native-aio=1 --innodb_file-per-table=1 --innodb-fast-shutdown=0 --innodb-flush-method=O_DIRECT --innodb_lru_scan_depth=1024 --innodb_lru_flush_size=256 # drop 50G pool size down if you don't have it. Not critical to reproduction. IO capacity here should be about what the hardware is. Otherwise gaps of 0 tps will appear without it being the cause of the bug. $ podman logs mdb10.6-uring_test ... 2021-11-11 6:06:49 0 [Warning] innodb_use_native_aio may cause hangs with this kernel 5.15.0-0.rc7.20211028git1fc596a56b33.56.fc36.x86_64; see https://jira.mariadb.org/browse/MDEV-26674 2021-11-11 6:06:49 0 [Note] InnoDB: Compressed tables use zlib 1.2.11 2021-11-11 6:06:49 0 [Note] InnoDB: Number of pools: 1 2021-11-11 6:06:49 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions 2021-11-11 6:06:49 0 [Note] mysqld: O_TMPFILE is not supported on /tmp (disabling future attempts) 2021-11-11 6:06:49 0 [Note] InnoDB: Using liburing Should contain first and last line here: $ podman exec mdb10.6-uring_test sysbench /usr/share/sysbench/oltp_update_index.lua --mysql-password=sbtest --percentile=99 --tables=8 --table_size=2000000 prepare Creating table 'sbtest1'... Inserting 2000000 records into 'sbtest1' Creating a secondary index on 'sbtest1'... Creating table 'sbtest2'... Inserting 2000000 records into 'sbtest2' Creating a secondary index on 'sbtest2'... Creating table 'sbtest3'... Inserting 2000000 records into 'sbtest3' Creating a secondary index on 'sbtest3'... Creating table 'sbtest4'... Inserting 2000000 records into 'sbtest4' Creating a secondary index on 'sbtest4'... Creating table 'sbtest5'... Inserting 2000000 records into 'sbtest5' Creating a secondary index on 'sbtest5'... Creating table 'sbtest6'... Inserting 2000000 records into 'sbtest6' Creating a secondary index on 'sbtest6'... Creating table 'sbtest7'... Inserting 2000000 records into 'sbtest7' Creating a secondary index on 'sbtest7'... Creating table 'sbtest8'... Inserting 2000000 records into 'sbtest8' Creating a secondary index on 'sbtest8'... # Adjust threads there to the amount of hardware threads available. time is the length of the test. $ podman exec mdb10.6-uring_test sysbench /usr/share/sysbench/oltp_update_index.lua --mysql-password=sbtest --percentile=99 --tables=8 --table_size=2000000 --rand-seed=42 --rand-type=uniform --max-requests=0 --time=600 --report-interval=5 --threads=64 run Eventually after https://mariadb.com/kb/en/innodb-system-variables/#innodb_fatal_semaphore_wait_threshold of 600 seconds the podman logs mdb10.6-uring_test will contains an error like: 2021-10-07 17:06:43 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ 211007 17:06:43 [ERROR] mysqld got signal 6 ; Restarting the container on the same populated ~/mdbtest volume could be slow due to recovery time. Remove contents and repeat prepare step. cleanup: podman kill mdb10.6-uring_test podman rm mdb10.6-uring_test sudo rm -rf ~/mdbtest ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-11 6:52 ` Daniel Black @ 2021-11-11 14:30 ` Jens Axboe 2021-11-11 14:58 ` Jens Axboe 0 siblings, 1 reply; 35+ messages in thread From: Jens Axboe @ 2021-11-11 14:30 UTC (permalink / raw) To: Daniel Black; +Cc: Salvatore Bonaccorso, Pavel Begunkov, linux-block, io-uring On 11/10/21 11:52 PM, Daniel Black wrote: >> Would it be possible to turn this into a full reproducer script? >> Something that someone that knows nothing about mysqld/mariadb can just >> run and have it reproduce. If I install the 10.6 packages from above, >> then it doesn't seem to use io_uring or be linked against liburing. > > Sorry Jens. > > Hope containers are ok. Don't think I have a way to run that, don't even know what podman is and nor does my distro. I'll google a bit and see if I can get this running. I'm fine building from source and running from there, as long as I know what to do. Would that make it any easier? It definitely would for me :-) -- Jens Axboe ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-11 14:30 ` Jens Axboe @ 2021-11-11 14:58 ` Jens Axboe 2021-11-11 15:29 ` Jens Axboe 0 siblings, 1 reply; 35+ messages in thread From: Jens Axboe @ 2021-11-11 14:58 UTC (permalink / raw) To: Daniel Black; +Cc: Salvatore Bonaccorso, Pavel Begunkov, linux-block, io-uring On 11/11/21 7:30 AM, Jens Axboe wrote: > On 11/10/21 11:52 PM, Daniel Black wrote: >>> Would it be possible to turn this into a full reproducer script? >>> Something that someone that knows nothing about mysqld/mariadb can just >>> run and have it reproduce. If I install the 10.6 packages from above, >>> then it doesn't seem to use io_uring or be linked against liburing. >> >> Sorry Jens. >> >> Hope containers are ok. > > Don't think I have a way to run that, don't even know what podman is > and nor does my distro. I'll google a bit and see if I can get this > running. > > I'm fine building from source and running from there, as long as I > know what to do. Would that make it any easier? It definitely would > for me :-) The podman approach seemed to work, and I was able to run all three steps. Didn't see any hangs. I'm going to try again dropping down the innodb pool size (box only has 32G of RAM). The storage can do a lot more than 5k IOPS, I'm going to try ramping that up. Does your reproducer box have multiple NUMA nodes, or is it a single socket/nod box? -- Jens Axboe ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-11 14:58 ` Jens Axboe @ 2021-11-11 15:29 ` Jens Axboe 2021-11-11 16:19 ` Jens Axboe 0 siblings, 1 reply; 35+ messages in thread From: Jens Axboe @ 2021-11-11 15:29 UTC (permalink / raw) To: Daniel Black; +Cc: Salvatore Bonaccorso, Pavel Begunkov, linux-block, io-uring On 11/11/21 7:58 AM, Jens Axboe wrote: > On 11/11/21 7:30 AM, Jens Axboe wrote: >> On 11/10/21 11:52 PM, Daniel Black wrote: >>>> Would it be possible to turn this into a full reproducer script? >>>> Something that someone that knows nothing about mysqld/mariadb can just >>>> run and have it reproduce. If I install the 10.6 packages from above, >>>> then it doesn't seem to use io_uring or be linked against liburing. >>> >>> Sorry Jens. >>> >>> Hope containers are ok. >> >> Don't think I have a way to run that, don't even know what podman is >> and nor does my distro. I'll google a bit and see if I can get this >> running. >> >> I'm fine building from source and running from there, as long as I >> know what to do. Would that make it any easier? It definitely would >> for me :-) > > The podman approach seemed to work, and I was able to run all three > steps. Didn't see any hangs. I'm going to try again dropping down > the innodb pool size (box only has 32G of RAM). > > The storage can do a lot more than 5k IOPS, I'm going to try ramping > that up. > > Does your reproducer box have multiple NUMA nodes, or is it a single > socket/nod box? Doesn't seem to reproduce for me on current -git. What file system are you using? -- Jens Axboe ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-11 15:29 ` Jens Axboe @ 2021-11-11 16:19 ` Jens Axboe 2021-11-11 16:55 ` Jens Axboe 0 siblings, 1 reply; 35+ messages in thread From: Jens Axboe @ 2021-11-11 16:19 UTC (permalink / raw) To: Daniel Black; +Cc: Salvatore Bonaccorso, Pavel Begunkov, linux-block, io-uring On 11/11/21 8:29 AM, Jens Axboe wrote: > On 11/11/21 7:58 AM, Jens Axboe wrote: >> On 11/11/21 7:30 AM, Jens Axboe wrote: >>> On 11/10/21 11:52 PM, Daniel Black wrote: >>>>> Would it be possible to turn this into a full reproducer script? >>>>> Something that someone that knows nothing about mysqld/mariadb can just >>>>> run and have it reproduce. If I install the 10.6 packages from above, >>>>> then it doesn't seem to use io_uring or be linked against liburing. >>>> >>>> Sorry Jens. >>>> >>>> Hope containers are ok. >>> >>> Don't think I have a way to run that, don't even know what podman is >>> and nor does my distro. I'll google a bit and see if I can get this >>> running. >>> >>> I'm fine building from source and running from there, as long as I >>> know what to do. Would that make it any easier? It definitely would >>> for me :-) >> >> The podman approach seemed to work, and I was able to run all three >> steps. Didn't see any hangs. I'm going to try again dropping down >> the innodb pool size (box only has 32G of RAM). >> >> The storage can do a lot more than 5k IOPS, I'm going to try ramping >> that up. >> >> Does your reproducer box have multiple NUMA nodes, or is it a single >> socket/nod box? > > Doesn't seem to reproduce for me on current -git. What file system are > you using? I seem to be able to hit it with ext4, guessing it has more cases that punt to buffered IO. As I initially suspected, I think this is a race with buffered file write hashing. I have a debug patch that just turns a regular non-numa box into multi nodes, may or may not be needed be needed to hit this, but I definitely can now. Looks like this: Node7 DUMP index=0, nr_w=1, max=128, r=0, f=1, h=0 w=ffff8f5e8b8470c0, hashed=1/0, flags=2 w=ffff8f5e95a9b8c0, hashed=1/0, flags=2 index=1, nr_w=0, max=127877, r=0, f=0, h=0 free_list worker=ffff8f5eaf2e0540 all_list worker=ffff8f5eaf2e0540 where we seed node7 in this case having two work items pending, but the worker state is stalled on hash. The hash logic was rewritten as part of the io-wq worker threads being changed for 5.11 iirc, which is why that was my initial suspicion here. I'll take a look at this and make a test patch. Looks like you are able to test self-built kernels, is that correct? -- Jens Axboe ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-11 16:19 ` Jens Axboe @ 2021-11-11 16:55 ` Jens Axboe 2021-11-11 17:28 ` Jens Axboe 0 siblings, 1 reply; 35+ messages in thread From: Jens Axboe @ 2021-11-11 16:55 UTC (permalink / raw) To: Daniel Black; +Cc: Salvatore Bonaccorso, Pavel Begunkov, linux-block, io-uring On 11/11/21 9:19 AM, Jens Axboe wrote: > On 11/11/21 8:29 AM, Jens Axboe wrote: >> On 11/11/21 7:58 AM, Jens Axboe wrote: >>> On 11/11/21 7:30 AM, Jens Axboe wrote: >>>> On 11/10/21 11:52 PM, Daniel Black wrote: >>>>>> Would it be possible to turn this into a full reproducer script? >>>>>> Something that someone that knows nothing about mysqld/mariadb can just >>>>>> run and have it reproduce. If I install the 10.6 packages from above, >>>>>> then it doesn't seem to use io_uring or be linked against liburing. >>>>> >>>>> Sorry Jens. >>>>> >>>>> Hope containers are ok. >>>> >>>> Don't think I have a way to run that, don't even know what podman is >>>> and nor does my distro. I'll google a bit and see if I can get this >>>> running. >>>> >>>> I'm fine building from source and running from there, as long as I >>>> know what to do. Would that make it any easier? It definitely would >>>> for me :-) >>> >>> The podman approach seemed to work, and I was able to run all three >>> steps. Didn't see any hangs. I'm going to try again dropping down >>> the innodb pool size (box only has 32G of RAM). >>> >>> The storage can do a lot more than 5k IOPS, I'm going to try ramping >>> that up. >>> >>> Does your reproducer box have multiple NUMA nodes, or is it a single >>> socket/nod box? >> >> Doesn't seem to reproduce for me on current -git. What file system are >> you using? > > I seem to be able to hit it with ext4, guessing it has more cases that > punt to buffered IO. As I initially suspected, I think this is a race > with buffered file write hashing. I have a debug patch that just turns > a regular non-numa box into multi nodes, may or may not be needed be > needed to hit this, but I definitely can now. Looks like this: > > Node7 DUMP > index=0, nr_w=1, max=128, r=0, f=1, h=0 > w=ffff8f5e8b8470c0, hashed=1/0, flags=2 > w=ffff8f5e95a9b8c0, hashed=1/0, flags=2 > index=1, nr_w=0, max=127877, r=0, f=0, h=0 > free_list > worker=ffff8f5eaf2e0540 > all_list > worker=ffff8f5eaf2e0540 > > where we seed node7 in this case having two work items pending, but the > worker state is stalled on hash. > > The hash logic was rewritten as part of the io-wq worker threads being > changed for 5.11 iirc, which is why that was my initial suspicion here. > > I'll take a look at this and make a test patch. Looks like you are able > to test self-built kernels, is that correct? Can you try with this patch? It's against -git, but it will apply to 5.15 as well. diff --git a/fs/io-wq.c b/fs/io-wq.c index afd955d53db9..7917b8866dcc 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -423,9 +423,10 @@ static inline unsigned int io_get_work_hash(struct io_wq_work *work) return work->flags >> IO_WQ_HASH_SHIFT; } -static void io_wait_on_hash(struct io_wqe *wqe, unsigned int hash) +static bool io_wait_on_hash(struct io_wqe *wqe, unsigned int hash) { struct io_wq *wq = wqe->wq; + bool ret = false; spin_lock_irq(&wq->hash->wait.lock); if (list_empty(&wqe->wait.entry)) { @@ -433,9 +434,11 @@ static void io_wait_on_hash(struct io_wqe *wqe, unsigned int hash) if (!test_bit(hash, &wq->hash->map)) { __set_current_state(TASK_RUNNING); list_del_init(&wqe->wait.entry); + ret = true; } } spin_unlock_irq(&wq->hash->wait.lock); + return ret; } static struct io_wq_work *io_get_next_work(struct io_wqe_acct *acct, @@ -447,6 +450,7 @@ static struct io_wq_work *io_get_next_work(struct io_wqe_acct *acct, unsigned int stall_hash = -1U; struct io_wqe *wqe = worker->wqe; +retry: wq_list_for_each(node, prev, &acct->work_list) { unsigned int hash; @@ -475,14 +479,18 @@ static struct io_wq_work *io_get_next_work(struct io_wqe_acct *acct, } if (stall_hash != -1U) { + bool do_retry; + /* * Set this before dropping the lock to avoid racing with new * work being added and clearing the stalled bit. */ set_bit(IO_ACCT_STALLED_BIT, &acct->flags); raw_spin_unlock(&wqe->lock); - io_wait_on_hash(wqe, stall_hash); + do_retry = io_wait_on_hash(wqe, stall_hash); raw_spin_lock(&wqe->lock); + if (do_retry) + goto retry; } return NULL; -- Jens Axboe ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-11 16:55 ` Jens Axboe @ 2021-11-11 17:28 ` Jens Axboe 2021-11-11 23:44 ` Jens Axboe 0 siblings, 1 reply; 35+ messages in thread From: Jens Axboe @ 2021-11-11 17:28 UTC (permalink / raw) To: Daniel Black; +Cc: Salvatore Bonaccorso, Pavel Begunkov, linux-block, io-uring On 11/11/21 9:55 AM, Jens Axboe wrote: > On 11/11/21 9:19 AM, Jens Axboe wrote: >> On 11/11/21 8:29 AM, Jens Axboe wrote: >>> On 11/11/21 7:58 AM, Jens Axboe wrote: >>>> On 11/11/21 7:30 AM, Jens Axboe wrote: >>>>> On 11/10/21 11:52 PM, Daniel Black wrote: >>>>>>> Would it be possible to turn this into a full reproducer script? >>>>>>> Something that someone that knows nothing about mysqld/mariadb can just >>>>>>> run and have it reproduce. If I install the 10.6 packages from above, >>>>>>> then it doesn't seem to use io_uring or be linked against liburing. >>>>>> >>>>>> Sorry Jens. >>>>>> >>>>>> Hope containers are ok. >>>>> >>>>> Don't think I have a way to run that, don't even know what podman is >>>>> and nor does my distro. I'll google a bit and see if I can get this >>>>> running. >>>>> >>>>> I'm fine building from source and running from there, as long as I >>>>> know what to do. Would that make it any easier? It definitely would >>>>> for me :-) >>>> >>>> The podman approach seemed to work, and I was able to run all three >>>> steps. Didn't see any hangs. I'm going to try again dropping down >>>> the innodb pool size (box only has 32G of RAM). >>>> >>>> The storage can do a lot more than 5k IOPS, I'm going to try ramping >>>> that up. >>>> >>>> Does your reproducer box have multiple NUMA nodes, or is it a single >>>> socket/nod box? >>> >>> Doesn't seem to reproduce for me on current -git. What file system are >>> you using? >> >> I seem to be able to hit it with ext4, guessing it has more cases that >> punt to buffered IO. As I initially suspected, I think this is a race >> with buffered file write hashing. I have a debug patch that just turns >> a regular non-numa box into multi nodes, may or may not be needed be >> needed to hit this, but I definitely can now. Looks like this: >> >> Node7 DUMP >> index=0, nr_w=1, max=128, r=0, f=1, h=0 >> w=ffff8f5e8b8470c0, hashed=1/0, flags=2 >> w=ffff8f5e95a9b8c0, hashed=1/0, flags=2 >> index=1, nr_w=0, max=127877, r=0, f=0, h=0 >> free_list >> worker=ffff8f5eaf2e0540 >> all_list >> worker=ffff8f5eaf2e0540 >> >> where we seed node7 in this case having two work items pending, but the >> worker state is stalled on hash. >> >> The hash logic was rewritten as part of the io-wq worker threads being >> changed for 5.11 iirc, which is why that was my initial suspicion here. >> >> I'll take a look at this and make a test patch. Looks like you are able >> to test self-built kernels, is that correct? > > Can you try with this patch? It's against -git, but it will apply to > 5.15 as well. I think that one covered one potential gap, but I just managed to reproduce a stall even with it. So hang on testing that one, I'll send you something more complete when I have confidence in it. -- Jens Axboe ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-11 17:28 ` Jens Axboe @ 2021-11-11 23:44 ` Jens Axboe 2021-11-12 6:25 ` Daniel Black 2021-11-14 20:33 ` Daniel Black 0 siblings, 2 replies; 35+ messages in thread From: Jens Axboe @ 2021-11-11 23:44 UTC (permalink / raw) To: Daniel Black; +Cc: Salvatore Bonaccorso, Pavel Begunkov, linux-block, io-uring On 11/11/21 10:28 AM, Jens Axboe wrote: > On 11/11/21 9:55 AM, Jens Axboe wrote: >> On 11/11/21 9:19 AM, Jens Axboe wrote: >>> On 11/11/21 8:29 AM, Jens Axboe wrote: >>>> On 11/11/21 7:58 AM, Jens Axboe wrote: >>>>> On 11/11/21 7:30 AM, Jens Axboe wrote: >>>>>> On 11/10/21 11:52 PM, Daniel Black wrote: >>>>>>>> Would it be possible to turn this into a full reproducer script? >>>>>>>> Something that someone that knows nothing about mysqld/mariadb can just >>>>>>>> run and have it reproduce. If I install the 10.6 packages from above, >>>>>>>> then it doesn't seem to use io_uring or be linked against liburing. >>>>>>> >>>>>>> Sorry Jens. >>>>>>> >>>>>>> Hope containers are ok. >>>>>> >>>>>> Don't think I have a way to run that, don't even know what podman is >>>>>> and nor does my distro. I'll google a bit and see if I can get this >>>>>> running. >>>>>> >>>>>> I'm fine building from source and running from there, as long as I >>>>>> know what to do. Would that make it any easier? It definitely would >>>>>> for me :-) >>>>> >>>>> The podman approach seemed to work, and I was able to run all three >>>>> steps. Didn't see any hangs. I'm going to try again dropping down >>>>> the innodb pool size (box only has 32G of RAM). >>>>> >>>>> The storage can do a lot more than 5k IOPS, I'm going to try ramping >>>>> that up. >>>>> >>>>> Does your reproducer box have multiple NUMA nodes, or is it a single >>>>> socket/nod box? >>>> >>>> Doesn't seem to reproduce for me on current -git. What file system are >>>> you using? >>> >>> I seem to be able to hit it with ext4, guessing it has more cases that >>> punt to buffered IO. As I initially suspected, I think this is a race >>> with buffered file write hashing. I have a debug patch that just turns >>> a regular non-numa box into multi nodes, may or may not be needed be >>> needed to hit this, but I definitely can now. Looks like this: >>> >>> Node7 DUMP >>> index=0, nr_w=1, max=128, r=0, f=1, h=0 >>> w=ffff8f5e8b8470c0, hashed=1/0, flags=2 >>> w=ffff8f5e95a9b8c0, hashed=1/0, flags=2 >>> index=1, nr_w=0, max=127877, r=0, f=0, h=0 >>> free_list >>> worker=ffff8f5eaf2e0540 >>> all_list >>> worker=ffff8f5eaf2e0540 >>> >>> where we seed node7 in this case having two work items pending, but the >>> worker state is stalled on hash. >>> >>> The hash logic was rewritten as part of the io-wq worker threads being >>> changed for 5.11 iirc, which is why that was my initial suspicion here. >>> >>> I'll take a look at this and make a test patch. Looks like you are able >>> to test self-built kernels, is that correct? >> >> Can you try with this patch? It's against -git, but it will apply to >> 5.15 as well. > > I think that one covered one potential gap, but I just managed to > reproduce a stall even with it. So hang on testing that one, I'll send > you something more complete when I have confidence in it. Alright, give this one a go if you can. Against -git, but will apply to 5.15 as well. diff --git a/fs/io-wq.c b/fs/io-wq.c index afd955d53db9..88202de519f6 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -423,9 +423,10 @@ static inline unsigned int io_get_work_hash(struct io_wq_work *work) return work->flags >> IO_WQ_HASH_SHIFT; } -static void io_wait_on_hash(struct io_wqe *wqe, unsigned int hash) +static bool io_wait_on_hash(struct io_wqe *wqe, unsigned int hash) { struct io_wq *wq = wqe->wq; + bool ret = false; spin_lock_irq(&wq->hash->wait.lock); if (list_empty(&wqe->wait.entry)) { @@ -433,9 +434,11 @@ static void io_wait_on_hash(struct io_wqe *wqe, unsigned int hash) if (!test_bit(hash, &wq->hash->map)) { __set_current_state(TASK_RUNNING); list_del_init(&wqe->wait.entry); + ret = true; } } spin_unlock_irq(&wq->hash->wait.lock); + return ret; } static struct io_wq_work *io_get_next_work(struct io_wqe_acct *acct, @@ -475,14 +478,21 @@ static struct io_wq_work *io_get_next_work(struct io_wqe_acct *acct, } if (stall_hash != -1U) { + bool unstalled; + /* * Set this before dropping the lock to avoid racing with new * work being added and clearing the stalled bit. */ set_bit(IO_ACCT_STALLED_BIT, &acct->flags); raw_spin_unlock(&wqe->lock); - io_wait_on_hash(wqe, stall_hash); + unstalled = io_wait_on_hash(wqe, stall_hash); raw_spin_lock(&wqe->lock); + if (unstalled) { + clear_bit(IO_ACCT_STALLED_BIT, &acct->flags); + if (wq_has_sleeper(&wqe->wq->hash->wait)) + wake_up(&wqe->wq->hash->wait); + } } return NULL; @@ -564,8 +574,11 @@ static void io_worker_handle_work(struct io_worker *worker) io_wqe_enqueue(wqe, linked); if (hash != -1U && !next_hashed) { + /* serialize hash clear with wake_up() */ + spin_lock_irq(&wq->hash->wait.lock); clear_bit(hash, &wq->hash->map); clear_bit(IO_ACCT_STALLED_BIT, &acct->flags); + spin_unlock_irq(&wq->hash->wait.lock); if (wq_has_sleeper(&wq->hash->wait)) wake_up(&wq->hash->wait); raw_spin_lock(&wqe->lock); -- Jens Axboe ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-11 23:44 ` Jens Axboe @ 2021-11-12 6:25 ` Daniel Black 2021-11-12 19:19 ` Salvatore Bonaccorso 2021-11-14 20:33 ` Daniel Black 1 sibling, 1 reply; 35+ messages in thread From: Daniel Black @ 2021-11-12 6:25 UTC (permalink / raw) To: Jens Axboe; +Cc: Salvatore Bonaccorso, Pavel Begunkov, linux-block, io-uring On Fri, Nov 12, 2021 at 10:44 AM Jens Axboe <[email protected]> wrote: > > On 11/11/21 10:28 AM, Jens Axboe wrote: > > On 11/11/21 9:55 AM, Jens Axboe wrote: > >> On 11/11/21 9:19 AM, Jens Axboe wrote: > >>> On 11/11/21 8:29 AM, Jens Axboe wrote: > >>>> On 11/11/21 7:58 AM, Jens Axboe wrote: > >>>>> On 11/11/21 7:30 AM, Jens Axboe wrote: > >>>>>> On 11/10/21 11:52 PM, Daniel Black wrote: > >>>>>>>> Would it be possible to turn this into a full reproducer script? > >>>>>>>> Something that someone that knows nothing about mysqld/mariadb can just > >>>>>>>> run and have it reproduce. If I install the 10.6 packages from above, > >>>>>>>> then it doesn't seem to use io_uring or be linked against liburing. > >>>>>>> > >>>>>>> Sorry Jens. > >>>>>>> > >>>>>>> Hope containers are ok. > >>>>>> > >>>>>> Don't think I have a way to run that, don't even know what podman is > >>>>>> and nor does my distro. I'll google a bit and see if I can get this > >>>>>> running. > >>>>>> > >>>>>> I'm fine building from source and running from there, as long as I > >>>>>> know what to do. Would that make it any easier? It definitely would > >>>>>> for me :-) > >>>>> > >>>>> The podman approach seemed to work, Thanks for bearing with it. > >>>>> and I was able to run all three > >>>>> steps. Didn't see any hangs. I'm going to try again dropping down > >>>>> the innodb pool size (box only has 32G of RAM). > >>>>> > >>>>> The storage can do a lot more than 5k IOPS, I'm going to try ramping > >>>>> that up. Good. > >>>>> > >>>>> Does your reproducer box have multiple NUMA nodes, or is it a single > >>>>> socket/nod box? It was NUMA. Pre 5.14.14 I could produce it on a simpler test on a single node. > >>>> > >>>> Doesn't seem to reproduce for me on current -git. What file system are > >>>> you using? Yes ext4. > >>> > >>> I seem to be able to hit it with ext4, guessing it has more cases that > >>> punt to buffered IO. As I initially suspected, I think this is a race > >>> with buffered file write hashing. I have a debug patch that just turns > >>> a regular non-numa box into multi nodes, may or may not be needed be > >>> needed to hit this, but I definitely can now. Looks like this: > >>> > >>> Node7 DUMP > >>> index=0, nr_w=1, max=128, r=0, f=1, h=0 > >>> w=ffff8f5e8b8470c0, hashed=1/0, flags=2 > >>> w=ffff8f5e95a9b8c0, hashed=1/0, flags=2 > >>> index=1, nr_w=0, max=127877, r=0, f=0, h=0 > >>> free_list > >>> worker=ffff8f5eaf2e0540 > >>> all_list > >>> worker=ffff8f5eaf2e0540 > >>> > >>> where we seed node7 in this case having two work items pending, but the > >>> worker state is stalled on hash. > >>> > >>> The hash logic was rewritten as part of the io-wq worker threads being > >>> changed for 5.11 iirc, which is why that was my initial suspicion here. > >>> > >>> I'll take a look at this and make a test patch. Looks like you are able > >>> to test self-built kernels, is that correct? I've been libreating prebuilt kernels, however on the path to self-built again. Just searching for the holy penguin pee (from yaboot da(ze|ys)) to peesign(sic) EFI kernels. jk, working through docs: https://docs.fedoraproject.org/en-US/quick-docs/kernel/build-custom-kernel/ > >> Can you try with this patch? It's against -git, but it will apply to > >> 5.15 as well. > > > > I think that one covered one potential gap, but I just managed to > > reproduce a stall even with it. So hang on testing that one, I'll send > > you something more complete when I have confidence in it. > > Alright, give this one a go if you can. Against -git, but will apply to > 5.15 as well. Applied, built, attempting to boot.... ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-12 6:25 ` Daniel Black @ 2021-11-12 19:19 ` Salvatore Bonaccorso 0 siblings, 0 replies; 35+ messages in thread From: Salvatore Bonaccorso @ 2021-11-12 19:19 UTC (permalink / raw) To: Daniel Black; +Cc: Jens Axboe, Pavel Begunkov, linux-block, io-uring Daniel, On Fri, Nov 12, 2021 at 05:25:31PM +1100, Daniel Black wrote: > On Fri, Nov 12, 2021 at 10:44 AM Jens Axboe <[email protected]> wrote: > > > > On 11/11/21 10:28 AM, Jens Axboe wrote: > > > On 11/11/21 9:55 AM, Jens Axboe wrote: > > >> On 11/11/21 9:19 AM, Jens Axboe wrote: > > >>> On 11/11/21 8:29 AM, Jens Axboe wrote: > > >>>> On 11/11/21 7:58 AM, Jens Axboe wrote: > > >>>>> On 11/11/21 7:30 AM, Jens Axboe wrote: > > >>>>>> On 11/10/21 11:52 PM, Daniel Black wrote: > > >>>>>>>> Would it be possible to turn this into a full reproducer script? > > >>>>>>>> Something that someone that knows nothing about mysqld/mariadb can just > > >>>>>>>> run and have it reproduce. If I install the 10.6 packages from above, > > >>>>>>>> then it doesn't seem to use io_uring or be linked against liburing. > > >>>>>>> > > >>>>>>> Sorry Jens. > > >>>>>>> > > >>>>>>> Hope containers are ok. > > >>>>>> > > >>>>>> Don't think I have a way to run that, don't even know what podman is > > >>>>>> and nor does my distro. I'll google a bit and see if I can get this > > >>>>>> running. > > >>>>>> > > >>>>>> I'm fine building from source and running from there, as long as I > > >>>>>> know what to do. Would that make it any easier? It definitely would > > >>>>>> for me :-) > > >>>>> > > >>>>> The podman approach seemed to work, > > Thanks for bearing with it. > > > >>>>> and I was able to run all three > > >>>>> steps. Didn't see any hangs. I'm going to try again dropping down > > >>>>> the innodb pool size (box only has 32G of RAM). > > >>>>> > > >>>>> The storage can do a lot more than 5k IOPS, I'm going to try ramping > > >>>>> that up. > > Good. > > > >>>>> > > >>>>> Does your reproducer box have multiple NUMA nodes, or is it a single > > >>>>> socket/nod box? > > It was NUMA. Pre 5.14.14 I could produce it on a simpler test on a single node. > > > >>>> > > >>>> Doesn't seem to reproduce for me on current -git. What file system are > > >>>> you using? > > Yes ext4. > > > >>> > > >>> I seem to be able to hit it with ext4, guessing it has more cases that > > >>> punt to buffered IO. As I initially suspected, I think this is a race > > >>> with buffered file write hashing. I have a debug patch that just turns > > >>> a regular non-numa box into multi nodes, may or may not be needed be > > >>> needed to hit this, but I definitely can now. Looks like this: > > >>> > > >>> Node7 DUMP > > >>> index=0, nr_w=1, max=128, r=0, f=1, h=0 > > >>> w=ffff8f5e8b8470c0, hashed=1/0, flags=2 > > >>> w=ffff8f5e95a9b8c0, hashed=1/0, flags=2 > > >>> index=1, nr_w=0, max=127877, r=0, f=0, h=0 > > >>> free_list > > >>> worker=ffff8f5eaf2e0540 > > >>> all_list > > >>> worker=ffff8f5eaf2e0540 > > >>> > > >>> where we seed node7 in this case having two work items pending, but the > > >>> worker state is stalled on hash. > > >>> > > >>> The hash logic was rewritten as part of the io-wq worker threads being > > >>> changed for 5.11 iirc, which is why that was my initial suspicion here. > > >>> > > >>> I'll take a look at this and make a test patch. Looks like you are able > > >>> to test self-built kernels, is that correct? > > I've been libreating prebuilt kernels, however on the path to self-built again. > > Just searching for the holy penguin pee (from yaboot da(ze|ys)) to > peesign(sic) EFI kernels. > jk, working through docs: > https://docs.fedoraproject.org/en-US/quick-docs/kernel/build-custom-kernel/ > > > >> Can you try with this patch? It's against -git, but it will apply to > > >> 5.15 as well. > > > > > > I think that one covered one potential gap, but I just managed to > > > reproduce a stall even with it. So hang on testing that one, I'll send > > > you something more complete when I have confidence in it. > > > > Alright, give this one a go if you can. Against -git, but will apply to > > 5.15 as well. > > Applied, built, attempting to boot.... If you want to do the same for Debian based system, the following might help to get the package built: https://kernel-team.pages.debian.net/kernel-handbook/ch-common-tasks.html#s4.2.2 I might be able to provide you otherwise a prebuild package with the patch (unsigned though, but best if you built and test it directly) Regards, Salvatore ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-11 23:44 ` Jens Axboe 2021-11-12 6:25 ` Daniel Black @ 2021-11-14 20:33 ` Daniel Black 2021-11-14 20:55 ` Jens Axboe 1 sibling, 1 reply; 35+ messages in thread From: Daniel Black @ 2021-11-14 20:33 UTC (permalink / raw) To: Jens Axboe; +Cc: Salvatore Bonaccorso, Pavel Begunkov, linux-block, io-uring On Fri, Nov 12, 2021 at 10:44 AM Jens Axboe <[email protected]> wrote: > > Alright, give this one a go if you can. Against -git, but will apply to > 5.15 as well. Works. Thank you very much. https://jira.mariadb.org/browse/MDEV-26674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=205599#comment-205599 Tested-by: Marko Mäkelä <[email protected]> > > > diff --git a/fs/io-wq.c b/fs/io-wq.c > index afd955d53db9..88202de519f6 100644 > --- a/fs/io-wq.c > +++ b/fs/io-wq.c > @@ -423,9 +423,10 @@ static inline unsigned int io_get_work_hash(struct io_wq_work *work) > return work->flags >> IO_WQ_HASH_SHIFT; > } > > -static void io_wait_on_hash(struct io_wqe *wqe, unsigned int hash) > +static bool io_wait_on_hash(struct io_wqe *wqe, unsigned int hash) > { > struct io_wq *wq = wqe->wq; > + bool ret = false; > > spin_lock_irq(&wq->hash->wait.lock); > if (list_empty(&wqe->wait.entry)) { > @@ -433,9 +434,11 @@ static void io_wait_on_hash(struct io_wqe *wqe, unsigned int hash) > if (!test_bit(hash, &wq->hash->map)) { > __set_current_state(TASK_RUNNING); > list_del_init(&wqe->wait.entry); > + ret = true; > } > } > spin_unlock_irq(&wq->hash->wait.lock); > + return ret; > } > > static struct io_wq_work *io_get_next_work(struct io_wqe_acct *acct, > @@ -475,14 +478,21 @@ static struct io_wq_work *io_get_next_work(struct io_wqe_acct *acct, > } > > if (stall_hash != -1U) { > + bool unstalled; > + > /* > * Set this before dropping the lock to avoid racing with new > * work being added and clearing the stalled bit. > */ > set_bit(IO_ACCT_STALLED_BIT, &acct->flags); > raw_spin_unlock(&wqe->lock); > - io_wait_on_hash(wqe, stall_hash); > + unstalled = io_wait_on_hash(wqe, stall_hash); > raw_spin_lock(&wqe->lock); > + if (unstalled) { > + clear_bit(IO_ACCT_STALLED_BIT, &acct->flags); > + if (wq_has_sleeper(&wqe->wq->hash->wait)) > + wake_up(&wqe->wq->hash->wait); > + } > } > > return NULL; > @@ -564,8 +574,11 @@ static void io_worker_handle_work(struct io_worker *worker) > io_wqe_enqueue(wqe, linked); > > if (hash != -1U && !next_hashed) { > + /* serialize hash clear with wake_up() */ > + spin_lock_irq(&wq->hash->wait.lock); > clear_bit(hash, &wq->hash->map); > clear_bit(IO_ACCT_STALLED_BIT, &acct->flags); > + spin_unlock_irq(&wq->hash->wait.lock); > if (wq_has_sleeper(&wq->hash->wait)) > wake_up(&wq->hash->wait); > raw_spin_lock(&wqe->lock); > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-14 20:33 ` Daniel Black @ 2021-11-14 20:55 ` Jens Axboe 2021-11-14 21:02 ` Salvatore Bonaccorso 2021-11-24 3:27 ` Daniel Black 0 siblings, 2 replies; 35+ messages in thread From: Jens Axboe @ 2021-11-14 20:55 UTC (permalink / raw) To: Daniel Black; +Cc: Salvatore Bonaccorso, Pavel Begunkov, linux-block, io-uring On 11/14/21 1:33 PM, Daniel Black wrote: > On Fri, Nov 12, 2021 at 10:44 AM Jens Axboe <[email protected]> wrote: >> >> Alright, give this one a go if you can. Against -git, but will apply to >> 5.15 as well. > > > Works. Thank you very much. > > https://jira.mariadb.org/browse/MDEV-26674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=205599#comment-205599 > > Tested-by: Marko Mäkelä <[email protected]> Awesome, thanks so much for reporting and testing. All bugs are shallow when given a reproducer, that certainly helped a ton in figuring out what this was and nailing a fix. The patch is already upstream (and in the 5.15 stable queue), and I provided 5.14 patches too. -- Jens Axboe ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-14 20:55 ` Jens Axboe @ 2021-11-14 21:02 ` Salvatore Bonaccorso 2021-11-14 21:03 ` Jens Axboe 2021-11-24 3:27 ` Daniel Black 1 sibling, 1 reply; 35+ messages in thread From: Salvatore Bonaccorso @ 2021-11-14 21:02 UTC (permalink / raw) To: Jens Axboe; +Cc: Daniel Black, Pavel Begunkov, linux-block, io-uring Hi, On Sun, Nov 14, 2021 at 01:55:20PM -0700, Jens Axboe wrote: > On 11/14/21 1:33 PM, Daniel Black wrote: > > On Fri, Nov 12, 2021 at 10:44 AM Jens Axboe <[email protected]> wrote: > >> > >> Alright, give this one a go if you can. Against -git, but will apply to > >> 5.15 as well. > > > > > > Works. Thank you very much. > > > > https://jira.mariadb.org/browse/MDEV-26674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=205599#comment-205599 > > > > Tested-by: Marko Mäkelä <[email protected]> > > Awesome, thanks so much for reporting and testing. All bugs are shallow > when given a reproducer, that certainly helped a ton in figuring out > what this was and nailing a fix. > > The patch is already upstream (and in the 5.15 stable queue), and I > provided 5.14 patches too. FTR, I cherry-picked as well the respective commit for Debian's upload of 5.15.2-1~exp1 to experimental as https://salsa.debian.org/kernel-team/linux/-/commit/657413869fa29b97ec886cf62a420ab43b935fff . Regards, Salvatore ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-14 21:02 ` Salvatore Bonaccorso @ 2021-11-14 21:03 ` Jens Axboe 0 siblings, 0 replies; 35+ messages in thread From: Jens Axboe @ 2021-11-14 21:03 UTC (permalink / raw) To: Salvatore Bonaccorso; +Cc: Daniel Black, Pavel Begunkov, linux-block, io-uring On 11/14/21 2:02 PM, Salvatore Bonaccorso wrote: > Hi, > > On Sun, Nov 14, 2021 at 01:55:20PM -0700, Jens Axboe wrote: >> On 11/14/21 1:33 PM, Daniel Black wrote: >>> On Fri, Nov 12, 2021 at 10:44 AM Jens Axboe <[email protected]> wrote: >>>> >>>> Alright, give this one a go if you can. Against -git, but will apply to >>>> 5.15 as well. >>> >>> >>> Works. Thank you very much. >>> >>> https://jira.mariadb.org/browse/MDEV-26674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=205599#comment-205599 >>> >>> Tested-by: Marko Mäkelä <[email protected]> >> >> Awesome, thanks so much for reporting and testing. All bugs are shallow >> when given a reproducer, that certainly helped a ton in figuring out >> what this was and nailing a fix. >> >> The patch is already upstream (and in the 5.15 stable queue), and I >> provided 5.14 patches too. > > FTR, I cherry-picked as well the respective commit for Debian's upload > of 5.15.2-1~exp1 to experimental as > https://salsa.debian.org/kernel-team/linux/-/commit/657413869fa29b97ec886cf62a420ab43b935fff Great thanks, you're beating stable :-) -- Jens Axboe ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-14 20:55 ` Jens Axboe 2021-11-14 21:02 ` Salvatore Bonaccorso @ 2021-11-24 3:27 ` Daniel Black 2021-11-24 15:28 ` Jens Axboe 1 sibling, 1 reply; 35+ messages in thread From: Daniel Black @ 2021-11-24 3:27 UTC (permalink / raw) To: Jens Axboe; +Cc: Salvatore Bonaccorso, Pavel Begunkov, linux-block, io-uring On Mon, Nov 15, 2021 at 7:55 AM Jens Axboe <[email protected]> wrote: > > On 11/14/21 1:33 PM, Daniel Black wrote: > > On Fri, Nov 12, 2021 at 10:44 AM Jens Axboe <[email protected]> wrote: > >> > >> Alright, give this one a go if you can. Against -git, but will apply to > >> 5.15 as well. > > > > > > Works. Thank you very much. > > > > https://jira.mariadb.org/browse/MDEV-26674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=205599#comment-205599 > > > > Tested-by: Marko Mäkelä <[email protected]> > > The patch is already upstream (and in the 5.15 stable queue), and I > provided 5.14 patches too. Jens, I'm getting the same reproducer on 5.14.20 (https://bugzilla.redhat.com/show_bug.cgi?id=2018882#c3) though the backport change logs indicate 5.14.19 has the patch. Anything missing? ext4 again (my mount is /dev/mapper/fedora_localhost--live-home on /home type ext4 (rw,relatime,seclabel)). previous container should work, thought a source option is there: build deps: liburing-dev, bison, libevent-dev, ncurses-dev, c++ libraries/compiler git clone --branch 10.6 --single-branch https://github.com/MariaDB/server mariadb-server (cd mariadb-server; git submodule update --init --recursive) mkdir build-mariadb-server cd build-mariadb-server cmake -DPLUGIN_{MROONGA,ROCKSDB,CONNECT,SPIDER,SPHINX,S3,COLUMNSTORE}=NO ../mariadb-server (ensure liburing userspace is picked up) cmake --build . --parallel mysql-test/mtr --mysqld=--innodb_use_native_aio=1 --nowarnings --parallel=4 --force encryption.innochecksum{,,,,,} Adding to mtr: --mysqld=--innodb_io_capacity=50000 --mysqld=--innodb_io_capacity_max=90000 will probably trip this quicker. 5.15.3 is good (https://jira.mariadb.org/browse/MDEV-26674?focusedCommentId=206787&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-206787). ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-24 3:27 ` Daniel Black @ 2021-11-24 15:28 ` Jens Axboe 2021-11-24 16:10 ` Jens Axboe 0 siblings, 1 reply; 35+ messages in thread From: Jens Axboe @ 2021-11-24 15:28 UTC (permalink / raw) To: Daniel Black; +Cc: Salvatore Bonaccorso, Pavel Begunkov, linux-block, io-uring [-- Attachment #1: Type: text/plain, Size: 1119 bytes --] On 11/23/21 8:27 PM, Daniel Black wrote: > On Mon, Nov 15, 2021 at 7:55 AM Jens Axboe <[email protected]> wrote: >> >> On 11/14/21 1:33 PM, Daniel Black wrote: >>> On Fri, Nov 12, 2021 at 10:44 AM Jens Axboe <[email protected]> wrote: >>>> >>>> Alright, give this one a go if you can. Against -git, but will apply to >>>> 5.15 as well. >>> >>> >>> Works. Thank you very much. >>> >>> https://jira.mariadb.org/browse/MDEV-26674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=205599#comment-205599 >>> >>> Tested-by: Marko Mäkelä <[email protected]> >> >> The patch is already upstream (and in the 5.15 stable queue), and I >> provided 5.14 patches too. > > Jens, > > I'm getting the same reproducer on 5.14.20 > (https://bugzilla.redhat.com/show_bug.cgi?id=2018882#c3) though the > backport change logs indicate 5.14.19 has the patch. > > Anything missing? We might also need another patch that isn't in stable, I'm attaching it here. Any chance you can run 5.14.20/21 with this applied? If not, I'll do some sanity checking here and push it to -stable. -- Jens Axboe [-- Attachment #2: 0001-io-wq-split-bounded-and-unbounded-work-into-separate.patch --] [-- Type: text/x-patch, Size: 13384 bytes --] From 99e6a29dbda79e5e050be1ffd38dd36622f61af5 Mon Sep 17 00:00:00 2001 From: Jens Axboe <[email protected]> Date: Wed, 24 Nov 2021 08:26:11 -0700 Subject: [PATCH] io-wq: split bounded and unbounded work into separate lists commit f95dc207b93da9c88ddbb7741ec3730c6657b88e upstream. We've got a few issues that all boil down to the fact that we have one list of pending work items, yet two different types of workers to serve them. This causes some oddities around workers switching type and even hashed work vs regular work on the same bounded list. Just separate them out cleanly, similarly to how we already do accounting of what is running. That provides a clean separation and removes some corner cases that can cause stalls when handling IO that is punted to io-wq. Fixes: ecc53c48c13d ("io-wq: check max_worker limits if a worker transitions bound state") Signed-off-by: Jens Axboe <[email protected]> --- fs/io-wq.c | 156 +++++++++++++++++++++++------------------------------ 1 file changed, 68 insertions(+), 88 deletions(-) diff --git a/fs/io-wq.c b/fs/io-wq.c index 0890d85ba285..7d63299b4776 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -32,7 +32,7 @@ enum { }; enum { - IO_WQE_FLAG_STALLED = 1, /* stalled on hash */ + IO_ACCT_STALLED_BIT = 0, /* stalled on hash */ }; /* @@ -71,25 +71,24 @@ struct io_wqe_acct { unsigned max_workers; int index; atomic_t nr_running; + struct io_wq_work_list work_list; + unsigned long flags; }; enum { IO_WQ_ACCT_BOUND, IO_WQ_ACCT_UNBOUND, + IO_WQ_ACCT_NR, }; /* * Per-node worker thread pool */ struct io_wqe { - struct { - raw_spinlock_t lock; - struct io_wq_work_list work_list; - unsigned flags; - } ____cacheline_aligned_in_smp; + raw_spinlock_t lock; + struct io_wqe_acct acct[2]; int node; - struct io_wqe_acct acct[2]; struct hlist_nulls_head free_list; struct list_head all_list; @@ -195,11 +194,10 @@ static void io_worker_exit(struct io_worker *worker) do_exit(0); } -static inline bool io_wqe_run_queue(struct io_wqe *wqe) - __must_hold(wqe->lock) +static inline bool io_acct_run_queue(struct io_wqe_acct *acct) { - if (!wq_list_empty(&wqe->work_list) && - !(wqe->flags & IO_WQE_FLAG_STALLED)) + if (!wq_list_empty(&acct->work_list) && + !test_bit(IO_ACCT_STALLED_BIT, &acct->flags)) return true; return false; } @@ -208,7 +206,8 @@ static inline bool io_wqe_run_queue(struct io_wqe *wqe) * Check head of free list for an available worker. If one isn't available, * caller must create one. */ -static bool io_wqe_activate_free_worker(struct io_wqe *wqe) +static bool io_wqe_activate_free_worker(struct io_wqe *wqe, + struct io_wqe_acct *acct) __must_hold(RCU) { struct hlist_nulls_node *n; @@ -222,6 +221,10 @@ static bool io_wqe_activate_free_worker(struct io_wqe *wqe) hlist_nulls_for_each_entry_rcu(worker, n, &wqe->free_list, nulls_node) { if (!io_worker_get(worker)) continue; + if (io_wqe_get_acct(worker) != acct) { + io_worker_release(worker); + continue; + } if (wake_up_process(worker->task)) { io_worker_release(worker); return true; @@ -340,7 +343,7 @@ static void io_wqe_dec_running(struct io_worker *worker) if (!(worker->flags & IO_WORKER_F_UP)) return; - if (atomic_dec_and_test(&acct->nr_running) && io_wqe_run_queue(wqe)) { + if (atomic_dec_and_test(&acct->nr_running) && io_acct_run_queue(acct)) { atomic_inc(&acct->nr_running); atomic_inc(&wqe->wq->worker_refs); io_queue_worker_create(wqe, worker, acct); @@ -355,29 +358,10 @@ static void __io_worker_busy(struct io_wqe *wqe, struct io_worker *worker, struct io_wq_work *work) __must_hold(wqe->lock) { - bool worker_bound, work_bound; - - BUILD_BUG_ON((IO_WQ_ACCT_UNBOUND ^ IO_WQ_ACCT_BOUND) != 1); - if (worker->flags & IO_WORKER_F_FREE) { worker->flags &= ~IO_WORKER_F_FREE; hlist_nulls_del_init_rcu(&worker->nulls_node); } - - /* - * If worker is moving from bound to unbound (or vice versa), then - * ensure we update the running accounting. - */ - worker_bound = (worker->flags & IO_WORKER_F_BOUND) != 0; - work_bound = (work->flags & IO_WQ_WORK_UNBOUND) == 0; - if (worker_bound != work_bound) { - int index = work_bound ? IO_WQ_ACCT_UNBOUND : IO_WQ_ACCT_BOUND; - io_wqe_dec_running(worker); - worker->flags ^= IO_WORKER_F_BOUND; - wqe->acct[index].nr_workers--; - wqe->acct[index ^ 1].nr_workers++; - io_wqe_inc_running(worker); - } } /* @@ -419,44 +403,23 @@ static bool io_wait_on_hash(struct io_wqe *wqe, unsigned int hash) return ret; } -/* - * We can always run the work if the worker is currently the same type as - * the work (eg both are bound, or both are unbound). If they are not the - * same, only allow it if incrementing the worker count would be allowed. - */ -static bool io_worker_can_run_work(struct io_worker *worker, - struct io_wq_work *work) -{ - struct io_wqe_acct *acct; - - if (!(worker->flags & IO_WORKER_F_BOUND) != - !(work->flags & IO_WQ_WORK_UNBOUND)) - return true; - - /* not the same type, check if we'd go over the limit */ - acct = io_work_get_acct(worker->wqe, work); - return acct->nr_workers < acct->max_workers; -} - -static struct io_wq_work *io_get_next_work(struct io_wqe *wqe, +static struct io_wq_work *io_get_next_work(struct io_wqe_acct *acct, struct io_worker *worker) __must_hold(wqe->lock) { struct io_wq_work_node *node, *prev; struct io_wq_work *work, *tail; unsigned int stall_hash = -1U; + struct io_wqe *wqe = worker->wqe; - wq_list_for_each(node, prev, &wqe->work_list) { + wq_list_for_each(node, prev, &acct->work_list) { unsigned int hash; work = container_of(node, struct io_wq_work, list); - if (!io_worker_can_run_work(worker, work)) - break; - /* not hashed, can run anytime */ if (!io_wq_is_hashed(work)) { - wq_list_del(&wqe->work_list, node, prev); + wq_list_del(&acct->work_list, node, prev); return work; } @@ -467,7 +430,7 @@ static struct io_wq_work *io_get_next_work(struct io_wqe *wqe, /* hashed, can run if not already running */ if (!test_and_set_bit(hash, &wqe->wq->hash->map)) { wqe->hash_tail[hash] = NULL; - wq_list_cut(&wqe->work_list, &tail->list, prev); + wq_list_cut(&acct->work_list, &tail->list, prev); return work; } if (stall_hash == -1U) @@ -483,12 +446,12 @@ static struct io_wq_work *io_get_next_work(struct io_wqe *wqe, * Set this before dropping the lock to avoid racing with new * work being added and clearing the stalled bit. */ - wqe->flags |= IO_WQE_FLAG_STALLED; + set_bit(IO_ACCT_STALLED_BIT, &acct->flags); raw_spin_unlock(&wqe->lock); unstalled = io_wait_on_hash(wqe, stall_hash); raw_spin_lock(&wqe->lock); if (unstalled) { - wqe->flags &= ~IO_WQE_FLAG_STALLED; + clear_bit(IO_ACCT_STALLED_BIT, &acct->flags); if (wq_has_sleeper(&wqe->wq->hash->wait)) wake_up(&wqe->wq->hash->wait); } @@ -525,6 +488,7 @@ static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work); static void io_worker_handle_work(struct io_worker *worker) __releases(wqe->lock) { + struct io_wqe_acct *acct = io_wqe_get_acct(worker); struct io_wqe *wqe = worker->wqe; struct io_wq *wq = wqe->wq; bool do_kill = test_bit(IO_WQ_BIT_EXIT, &wq->state); @@ -539,7 +503,7 @@ static void io_worker_handle_work(struct io_worker *worker) * can't make progress, any work completion or insertion will * clear the stalled flag. */ - work = io_get_next_work(wqe, worker); + work = io_get_next_work(acct, worker); if (work) __io_worker_busy(wqe, worker, work); @@ -575,7 +539,7 @@ static void io_worker_handle_work(struct io_worker *worker) /* serialize hash clear with wake_up() */ spin_lock_irq(&wq->hash->wait.lock); clear_bit(hash, &wq->hash->map); - wqe->flags &= ~IO_WQE_FLAG_STALLED; + clear_bit(IO_ACCT_STALLED_BIT, &acct->flags); spin_unlock_irq(&wq->hash->wait.lock); if (wq_has_sleeper(&wq->hash->wait)) wake_up(&wq->hash->wait); @@ -594,6 +558,7 @@ static void io_worker_handle_work(struct io_worker *worker) static int io_wqe_worker(void *data) { struct io_worker *worker = data; + struct io_wqe_acct *acct = io_wqe_get_acct(worker); struct io_wqe *wqe = worker->wqe; struct io_wq *wq = wqe->wq; char buf[TASK_COMM_LEN]; @@ -609,7 +574,7 @@ static int io_wqe_worker(void *data) set_current_state(TASK_INTERRUPTIBLE); loop: raw_spin_lock_irq(&wqe->lock); - if (io_wqe_run_queue(wqe)) { + if (io_acct_run_queue(acct)) { io_worker_handle_work(worker); goto loop; } @@ -777,12 +742,13 @@ static void io_run_cancel(struct io_wq_work *work, struct io_wqe *wqe) static void io_wqe_insert_work(struct io_wqe *wqe, struct io_wq_work *work) { + struct io_wqe_acct *acct = io_work_get_acct(wqe, work); unsigned int hash; struct io_wq_work *tail; if (!io_wq_is_hashed(work)) { append: - wq_list_add_tail(&work->list, &wqe->work_list); + wq_list_add_tail(&work->list, &acct->work_list); return; } @@ -792,7 +758,7 @@ static void io_wqe_insert_work(struct io_wqe *wqe, struct io_wq_work *work) if (!tail) goto append; - wq_list_add_after(&work->list, &tail->list, &wqe->work_list); + wq_list_add_after(&work->list, &tail->list, &acct->work_list); } static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work) @@ -814,10 +780,10 @@ static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work) raw_spin_lock_irqsave(&wqe->lock, flags); io_wqe_insert_work(wqe, work); - wqe->flags &= ~IO_WQE_FLAG_STALLED; + clear_bit(IO_ACCT_STALLED_BIT, &acct->flags); rcu_read_lock(); - do_create = !io_wqe_activate_free_worker(wqe); + do_create = !io_wqe_activate_free_worker(wqe, acct); rcu_read_unlock(); raw_spin_unlock_irqrestore(&wqe->lock, flags); @@ -870,6 +836,7 @@ static inline void io_wqe_remove_pending(struct io_wqe *wqe, struct io_wq_work *work, struct io_wq_work_node *prev) { + struct io_wqe_acct *acct = io_work_get_acct(wqe, work); unsigned int hash = io_get_work_hash(work); struct io_wq_work *prev_work = NULL; @@ -881,7 +848,7 @@ static inline void io_wqe_remove_pending(struct io_wqe *wqe, else wqe->hash_tail[hash] = NULL; } - wq_list_del(&wqe->work_list, &work->list, prev); + wq_list_del(&acct->work_list, &work->list, prev); } static void io_wqe_cancel_pending_work(struct io_wqe *wqe, @@ -890,22 +857,27 @@ static void io_wqe_cancel_pending_work(struct io_wqe *wqe, struct io_wq_work_node *node, *prev; struct io_wq_work *work; unsigned long flags; + int i; retry: raw_spin_lock_irqsave(&wqe->lock, flags); - wq_list_for_each(node, prev, &wqe->work_list) { - work = container_of(node, struct io_wq_work, list); - if (!match->fn(work, match->data)) - continue; - io_wqe_remove_pending(wqe, work, prev); - raw_spin_unlock_irqrestore(&wqe->lock, flags); - io_run_cancel(work, wqe); - match->nr_pending++; - if (!match->cancel_all) - return; + for (i = 0; i < IO_WQ_ACCT_NR; i++) { + struct io_wqe_acct *acct = io_get_acct(wqe, i == 0); - /* not safe to continue after unlock */ - goto retry; + wq_list_for_each(node, prev, &acct->work_list) { + work = container_of(node, struct io_wq_work, list); + if (!match->fn(work, match->data)) + continue; + io_wqe_remove_pending(wqe, work, prev); + raw_spin_unlock_irqrestore(&wqe->lock, flags); + io_run_cancel(work, wqe); + match->nr_pending++; + if (!match->cancel_all) + return; + + /* not safe to continue after unlock */ + goto retry; + } } raw_spin_unlock_irqrestore(&wqe->lock, flags); } @@ -966,18 +938,24 @@ static int io_wqe_hash_wake(struct wait_queue_entry *wait, unsigned mode, int sync, void *key) { struct io_wqe *wqe = container_of(wait, struct io_wqe, wait); + int i; list_del_init(&wait->entry); rcu_read_lock(); - io_wqe_activate_free_worker(wqe); + for (i = 0; i < IO_WQ_ACCT_NR; i++) { + struct io_wqe_acct *acct = &wqe->acct[i]; + + if (test_and_clear_bit(IO_ACCT_STALLED_BIT, &acct->flags)) + io_wqe_activate_free_worker(wqe, acct); + } rcu_read_unlock(); return 1; } struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data) { - int ret, node; + int ret, node, i; struct io_wq *wq; if (WARN_ON_ONCE(!data->free_work || !data->do_work)) @@ -1012,18 +990,20 @@ struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data) cpumask_copy(wqe->cpu_mask, cpumask_of_node(node)); wq->wqes[node] = wqe; wqe->node = alloc_node; - wqe->acct[IO_WQ_ACCT_BOUND].index = IO_WQ_ACCT_BOUND; - wqe->acct[IO_WQ_ACCT_UNBOUND].index = IO_WQ_ACCT_UNBOUND; wqe->acct[IO_WQ_ACCT_BOUND].max_workers = bounded; - atomic_set(&wqe->acct[IO_WQ_ACCT_BOUND].nr_running, 0); wqe->acct[IO_WQ_ACCT_UNBOUND].max_workers = task_rlimit(current, RLIMIT_NPROC); - atomic_set(&wqe->acct[IO_WQ_ACCT_UNBOUND].nr_running, 0); - wqe->wait.func = io_wqe_hash_wake; INIT_LIST_HEAD(&wqe->wait.entry); + wqe->wait.func = io_wqe_hash_wake; + for (i = 0; i < IO_WQ_ACCT_NR; i++) { + struct io_wqe_acct *acct = &wqe->acct[i]; + + acct->index = i; + atomic_set(&acct->nr_running, 0); + INIT_WQ_LIST(&acct->work_list); + } wqe->wq = wq; raw_spin_lock_init(&wqe->lock); - INIT_WQ_LIST(&wqe->work_list); INIT_HLIST_NULLS_HEAD(&wqe->free_list, 0); INIT_LIST_HEAD(&wqe->all_list); } -- 2.34.0 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-24 15:28 ` Jens Axboe @ 2021-11-24 16:10 ` Jens Axboe 2021-11-24 16:18 ` Greg Kroah-Hartman 0 siblings, 1 reply; 35+ messages in thread From: Jens Axboe @ 2021-11-24 16:10 UTC (permalink / raw) To: Daniel Black Cc: Salvatore Bonaccorso, Pavel Begunkov, linux-block, io-uring, stable, Greg Kroah-Hartman [-- Attachment #1: Type: text/plain, Size: 1265 bytes --] On 11/24/21 8:28 AM, Jens Axboe wrote: > On 11/23/21 8:27 PM, Daniel Black wrote: >> On Mon, Nov 15, 2021 at 7:55 AM Jens Axboe <[email protected]> wrote: >>> >>> On 11/14/21 1:33 PM, Daniel Black wrote: >>>> On Fri, Nov 12, 2021 at 10:44 AM Jens Axboe <[email protected]> wrote: >>>>> >>>>> Alright, give this one a go if you can. Against -git, but will apply to >>>>> 5.15 as well. >>>> >>>> >>>> Works. Thank you very much. >>>> >>>> https://jira.mariadb.org/browse/MDEV-26674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=205599#comment-205599 >>>> >>>> Tested-by: Marko Mäkelä <[email protected]> >>> >>> The patch is already upstream (and in the 5.15 stable queue), and I >>> provided 5.14 patches too. >> >> Jens, >> >> I'm getting the same reproducer on 5.14.20 >> (https://bugzilla.redhat.com/show_bug.cgi?id=2018882#c3) though the >> backport change logs indicate 5.14.19 has the patch. >> >> Anything missing? > > We might also need another patch that isn't in stable, I'm attaching > it here. Any chance you can run 5.14.20/21 with this applied? If not, > I'll do some sanity checking here and push it to -stable. Looks good to me - Greg, would you mind queueing this up for 5.14-stable? -- Jens Axboe [-- Attachment #2: 0001-io-wq-split-bounded-and-unbounded-work-into-separate.patch --] [-- Type: text/x-patch, Size: 13384 bytes --] From 99e6a29dbda79e5e050be1ffd38dd36622f61af5 Mon Sep 17 00:00:00 2001 From: Jens Axboe <[email protected]> Date: Wed, 24 Nov 2021 08:26:11 -0700 Subject: [PATCH] io-wq: split bounded and unbounded work into separate lists commit f95dc207b93da9c88ddbb7741ec3730c6657b88e upstream. We've got a few issues that all boil down to the fact that we have one list of pending work items, yet two different types of workers to serve them. This causes some oddities around workers switching type and even hashed work vs regular work on the same bounded list. Just separate them out cleanly, similarly to how we already do accounting of what is running. That provides a clean separation and removes some corner cases that can cause stalls when handling IO that is punted to io-wq. Fixes: ecc53c48c13d ("io-wq: check max_worker limits if a worker transitions bound state") Signed-off-by: Jens Axboe <[email protected]> --- fs/io-wq.c | 156 +++++++++++++++++++++++------------------------------ 1 file changed, 68 insertions(+), 88 deletions(-) diff --git a/fs/io-wq.c b/fs/io-wq.c index 0890d85ba285..7d63299b4776 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -32,7 +32,7 @@ enum { }; enum { - IO_WQE_FLAG_STALLED = 1, /* stalled on hash */ + IO_ACCT_STALLED_BIT = 0, /* stalled on hash */ }; /* @@ -71,25 +71,24 @@ struct io_wqe_acct { unsigned max_workers; int index; atomic_t nr_running; + struct io_wq_work_list work_list; + unsigned long flags; }; enum { IO_WQ_ACCT_BOUND, IO_WQ_ACCT_UNBOUND, + IO_WQ_ACCT_NR, }; /* * Per-node worker thread pool */ struct io_wqe { - struct { - raw_spinlock_t lock; - struct io_wq_work_list work_list; - unsigned flags; - } ____cacheline_aligned_in_smp; + raw_spinlock_t lock; + struct io_wqe_acct acct[2]; int node; - struct io_wqe_acct acct[2]; struct hlist_nulls_head free_list; struct list_head all_list; @@ -195,11 +194,10 @@ static void io_worker_exit(struct io_worker *worker) do_exit(0); } -static inline bool io_wqe_run_queue(struct io_wqe *wqe) - __must_hold(wqe->lock) +static inline bool io_acct_run_queue(struct io_wqe_acct *acct) { - if (!wq_list_empty(&wqe->work_list) && - !(wqe->flags & IO_WQE_FLAG_STALLED)) + if (!wq_list_empty(&acct->work_list) && + !test_bit(IO_ACCT_STALLED_BIT, &acct->flags)) return true; return false; } @@ -208,7 +206,8 @@ static inline bool io_wqe_run_queue(struct io_wqe *wqe) * Check head of free list for an available worker. If one isn't available, * caller must create one. */ -static bool io_wqe_activate_free_worker(struct io_wqe *wqe) +static bool io_wqe_activate_free_worker(struct io_wqe *wqe, + struct io_wqe_acct *acct) __must_hold(RCU) { struct hlist_nulls_node *n; @@ -222,6 +221,10 @@ static bool io_wqe_activate_free_worker(struct io_wqe *wqe) hlist_nulls_for_each_entry_rcu(worker, n, &wqe->free_list, nulls_node) { if (!io_worker_get(worker)) continue; + if (io_wqe_get_acct(worker) != acct) { + io_worker_release(worker); + continue; + } if (wake_up_process(worker->task)) { io_worker_release(worker); return true; @@ -340,7 +343,7 @@ static void io_wqe_dec_running(struct io_worker *worker) if (!(worker->flags & IO_WORKER_F_UP)) return; - if (atomic_dec_and_test(&acct->nr_running) && io_wqe_run_queue(wqe)) { + if (atomic_dec_and_test(&acct->nr_running) && io_acct_run_queue(acct)) { atomic_inc(&acct->nr_running); atomic_inc(&wqe->wq->worker_refs); io_queue_worker_create(wqe, worker, acct); @@ -355,29 +358,10 @@ static void __io_worker_busy(struct io_wqe *wqe, struct io_worker *worker, struct io_wq_work *work) __must_hold(wqe->lock) { - bool worker_bound, work_bound; - - BUILD_BUG_ON((IO_WQ_ACCT_UNBOUND ^ IO_WQ_ACCT_BOUND) != 1); - if (worker->flags & IO_WORKER_F_FREE) { worker->flags &= ~IO_WORKER_F_FREE; hlist_nulls_del_init_rcu(&worker->nulls_node); } - - /* - * If worker is moving from bound to unbound (or vice versa), then - * ensure we update the running accounting. - */ - worker_bound = (worker->flags & IO_WORKER_F_BOUND) != 0; - work_bound = (work->flags & IO_WQ_WORK_UNBOUND) == 0; - if (worker_bound != work_bound) { - int index = work_bound ? IO_WQ_ACCT_UNBOUND : IO_WQ_ACCT_BOUND; - io_wqe_dec_running(worker); - worker->flags ^= IO_WORKER_F_BOUND; - wqe->acct[index].nr_workers--; - wqe->acct[index ^ 1].nr_workers++; - io_wqe_inc_running(worker); - } } /* @@ -419,44 +403,23 @@ static bool io_wait_on_hash(struct io_wqe *wqe, unsigned int hash) return ret; } -/* - * We can always run the work if the worker is currently the same type as - * the work (eg both are bound, or both are unbound). If they are not the - * same, only allow it if incrementing the worker count would be allowed. - */ -static bool io_worker_can_run_work(struct io_worker *worker, - struct io_wq_work *work) -{ - struct io_wqe_acct *acct; - - if (!(worker->flags & IO_WORKER_F_BOUND) != - !(work->flags & IO_WQ_WORK_UNBOUND)) - return true; - - /* not the same type, check if we'd go over the limit */ - acct = io_work_get_acct(worker->wqe, work); - return acct->nr_workers < acct->max_workers; -} - -static struct io_wq_work *io_get_next_work(struct io_wqe *wqe, +static struct io_wq_work *io_get_next_work(struct io_wqe_acct *acct, struct io_worker *worker) __must_hold(wqe->lock) { struct io_wq_work_node *node, *prev; struct io_wq_work *work, *tail; unsigned int stall_hash = -1U; + struct io_wqe *wqe = worker->wqe; - wq_list_for_each(node, prev, &wqe->work_list) { + wq_list_for_each(node, prev, &acct->work_list) { unsigned int hash; work = container_of(node, struct io_wq_work, list); - if (!io_worker_can_run_work(worker, work)) - break; - /* not hashed, can run anytime */ if (!io_wq_is_hashed(work)) { - wq_list_del(&wqe->work_list, node, prev); + wq_list_del(&acct->work_list, node, prev); return work; } @@ -467,7 +430,7 @@ static struct io_wq_work *io_get_next_work(struct io_wqe *wqe, /* hashed, can run if not already running */ if (!test_and_set_bit(hash, &wqe->wq->hash->map)) { wqe->hash_tail[hash] = NULL; - wq_list_cut(&wqe->work_list, &tail->list, prev); + wq_list_cut(&acct->work_list, &tail->list, prev); return work; } if (stall_hash == -1U) @@ -483,12 +446,12 @@ static struct io_wq_work *io_get_next_work(struct io_wqe *wqe, * Set this before dropping the lock to avoid racing with new * work being added and clearing the stalled bit. */ - wqe->flags |= IO_WQE_FLAG_STALLED; + set_bit(IO_ACCT_STALLED_BIT, &acct->flags); raw_spin_unlock(&wqe->lock); unstalled = io_wait_on_hash(wqe, stall_hash); raw_spin_lock(&wqe->lock); if (unstalled) { - wqe->flags &= ~IO_WQE_FLAG_STALLED; + clear_bit(IO_ACCT_STALLED_BIT, &acct->flags); if (wq_has_sleeper(&wqe->wq->hash->wait)) wake_up(&wqe->wq->hash->wait); } @@ -525,6 +488,7 @@ static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work); static void io_worker_handle_work(struct io_worker *worker) __releases(wqe->lock) { + struct io_wqe_acct *acct = io_wqe_get_acct(worker); struct io_wqe *wqe = worker->wqe; struct io_wq *wq = wqe->wq; bool do_kill = test_bit(IO_WQ_BIT_EXIT, &wq->state); @@ -539,7 +503,7 @@ static void io_worker_handle_work(struct io_worker *worker) * can't make progress, any work completion or insertion will * clear the stalled flag. */ - work = io_get_next_work(wqe, worker); + work = io_get_next_work(acct, worker); if (work) __io_worker_busy(wqe, worker, work); @@ -575,7 +539,7 @@ static void io_worker_handle_work(struct io_worker *worker) /* serialize hash clear with wake_up() */ spin_lock_irq(&wq->hash->wait.lock); clear_bit(hash, &wq->hash->map); - wqe->flags &= ~IO_WQE_FLAG_STALLED; + clear_bit(IO_ACCT_STALLED_BIT, &acct->flags); spin_unlock_irq(&wq->hash->wait.lock); if (wq_has_sleeper(&wq->hash->wait)) wake_up(&wq->hash->wait); @@ -594,6 +558,7 @@ static void io_worker_handle_work(struct io_worker *worker) static int io_wqe_worker(void *data) { struct io_worker *worker = data; + struct io_wqe_acct *acct = io_wqe_get_acct(worker); struct io_wqe *wqe = worker->wqe; struct io_wq *wq = wqe->wq; char buf[TASK_COMM_LEN]; @@ -609,7 +574,7 @@ static int io_wqe_worker(void *data) set_current_state(TASK_INTERRUPTIBLE); loop: raw_spin_lock_irq(&wqe->lock); - if (io_wqe_run_queue(wqe)) { + if (io_acct_run_queue(acct)) { io_worker_handle_work(worker); goto loop; } @@ -777,12 +742,13 @@ static void io_run_cancel(struct io_wq_work *work, struct io_wqe *wqe) static void io_wqe_insert_work(struct io_wqe *wqe, struct io_wq_work *work) { + struct io_wqe_acct *acct = io_work_get_acct(wqe, work); unsigned int hash; struct io_wq_work *tail; if (!io_wq_is_hashed(work)) { append: - wq_list_add_tail(&work->list, &wqe->work_list); + wq_list_add_tail(&work->list, &acct->work_list); return; } @@ -792,7 +758,7 @@ static void io_wqe_insert_work(struct io_wqe *wqe, struct io_wq_work *work) if (!tail) goto append; - wq_list_add_after(&work->list, &tail->list, &wqe->work_list); + wq_list_add_after(&work->list, &tail->list, &acct->work_list); } static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work) @@ -814,10 +780,10 @@ static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work) raw_spin_lock_irqsave(&wqe->lock, flags); io_wqe_insert_work(wqe, work); - wqe->flags &= ~IO_WQE_FLAG_STALLED; + clear_bit(IO_ACCT_STALLED_BIT, &acct->flags); rcu_read_lock(); - do_create = !io_wqe_activate_free_worker(wqe); + do_create = !io_wqe_activate_free_worker(wqe, acct); rcu_read_unlock(); raw_spin_unlock_irqrestore(&wqe->lock, flags); @@ -870,6 +836,7 @@ static inline void io_wqe_remove_pending(struct io_wqe *wqe, struct io_wq_work *work, struct io_wq_work_node *prev) { + struct io_wqe_acct *acct = io_work_get_acct(wqe, work); unsigned int hash = io_get_work_hash(work); struct io_wq_work *prev_work = NULL; @@ -881,7 +848,7 @@ static inline void io_wqe_remove_pending(struct io_wqe *wqe, else wqe->hash_tail[hash] = NULL; } - wq_list_del(&wqe->work_list, &work->list, prev); + wq_list_del(&acct->work_list, &work->list, prev); } static void io_wqe_cancel_pending_work(struct io_wqe *wqe, @@ -890,22 +857,27 @@ static void io_wqe_cancel_pending_work(struct io_wqe *wqe, struct io_wq_work_node *node, *prev; struct io_wq_work *work; unsigned long flags; + int i; retry: raw_spin_lock_irqsave(&wqe->lock, flags); - wq_list_for_each(node, prev, &wqe->work_list) { - work = container_of(node, struct io_wq_work, list); - if (!match->fn(work, match->data)) - continue; - io_wqe_remove_pending(wqe, work, prev); - raw_spin_unlock_irqrestore(&wqe->lock, flags); - io_run_cancel(work, wqe); - match->nr_pending++; - if (!match->cancel_all) - return; + for (i = 0; i < IO_WQ_ACCT_NR; i++) { + struct io_wqe_acct *acct = io_get_acct(wqe, i == 0); - /* not safe to continue after unlock */ - goto retry; + wq_list_for_each(node, prev, &acct->work_list) { + work = container_of(node, struct io_wq_work, list); + if (!match->fn(work, match->data)) + continue; + io_wqe_remove_pending(wqe, work, prev); + raw_spin_unlock_irqrestore(&wqe->lock, flags); + io_run_cancel(work, wqe); + match->nr_pending++; + if (!match->cancel_all) + return; + + /* not safe to continue after unlock */ + goto retry; + } } raw_spin_unlock_irqrestore(&wqe->lock, flags); } @@ -966,18 +938,24 @@ static int io_wqe_hash_wake(struct wait_queue_entry *wait, unsigned mode, int sync, void *key) { struct io_wqe *wqe = container_of(wait, struct io_wqe, wait); + int i; list_del_init(&wait->entry); rcu_read_lock(); - io_wqe_activate_free_worker(wqe); + for (i = 0; i < IO_WQ_ACCT_NR; i++) { + struct io_wqe_acct *acct = &wqe->acct[i]; + + if (test_and_clear_bit(IO_ACCT_STALLED_BIT, &acct->flags)) + io_wqe_activate_free_worker(wqe, acct); + } rcu_read_unlock(); return 1; } struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data) { - int ret, node; + int ret, node, i; struct io_wq *wq; if (WARN_ON_ONCE(!data->free_work || !data->do_work)) @@ -1012,18 +990,20 @@ struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data) cpumask_copy(wqe->cpu_mask, cpumask_of_node(node)); wq->wqes[node] = wqe; wqe->node = alloc_node; - wqe->acct[IO_WQ_ACCT_BOUND].index = IO_WQ_ACCT_BOUND; - wqe->acct[IO_WQ_ACCT_UNBOUND].index = IO_WQ_ACCT_UNBOUND; wqe->acct[IO_WQ_ACCT_BOUND].max_workers = bounded; - atomic_set(&wqe->acct[IO_WQ_ACCT_BOUND].nr_running, 0); wqe->acct[IO_WQ_ACCT_UNBOUND].max_workers = task_rlimit(current, RLIMIT_NPROC); - atomic_set(&wqe->acct[IO_WQ_ACCT_UNBOUND].nr_running, 0); - wqe->wait.func = io_wqe_hash_wake; INIT_LIST_HEAD(&wqe->wait.entry); + wqe->wait.func = io_wqe_hash_wake; + for (i = 0; i < IO_WQ_ACCT_NR; i++) { + struct io_wqe_acct *acct = &wqe->acct[i]; + + acct->index = i; + atomic_set(&acct->nr_running, 0); + INIT_WQ_LIST(&acct->work_list); + } wqe->wq = wq; raw_spin_lock_init(&wqe->lock); - INIT_WQ_LIST(&wqe->work_list); INIT_HLIST_NULLS_HEAD(&wqe->free_list, 0); INIT_LIST_HEAD(&wqe->all_list); } -- 2.34.0 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-24 16:10 ` Jens Axboe @ 2021-11-24 16:18 ` Greg Kroah-Hartman 2021-11-24 16:22 ` Jens Axboe 0 siblings, 1 reply; 35+ messages in thread From: Greg Kroah-Hartman @ 2021-11-24 16:18 UTC (permalink / raw) To: Jens Axboe Cc: Daniel Black, Salvatore Bonaccorso, Pavel Begunkov, linux-block, io-uring, stable On Wed, Nov 24, 2021 at 09:10:25AM -0700, Jens Axboe wrote: > On 11/24/21 8:28 AM, Jens Axboe wrote: > > On 11/23/21 8:27 PM, Daniel Black wrote: > >> On Mon, Nov 15, 2021 at 7:55 AM Jens Axboe <[email protected]> wrote: > >>> > >>> On 11/14/21 1:33 PM, Daniel Black wrote: > >>>> On Fri, Nov 12, 2021 at 10:44 AM Jens Axboe <[email protected]> wrote: > >>>>> > >>>>> Alright, give this one a go if you can. Against -git, but will apply to > >>>>> 5.15 as well. > >>>> > >>>> > >>>> Works. Thank you very much. > >>>> > >>>> https://jira.mariadb.org/browse/MDEV-26674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=205599#comment-205599 > >>>> > >>>> Tested-by: Marko Mäkelä <[email protected]> > >>> > >>> The patch is already upstream (and in the 5.15 stable queue), and I > >>> provided 5.14 patches too. > >> > >> Jens, > >> > >> I'm getting the same reproducer on 5.14.20 > >> (https://bugzilla.redhat.com/show_bug.cgi?id=2018882#c3) though the > >> backport change logs indicate 5.14.19 has the patch. > >> > >> Anything missing? > > > > We might also need another patch that isn't in stable, I'm attaching > > it here. Any chance you can run 5.14.20/21 with this applied? If not, > > I'll do some sanity checking here and push it to -stable. > > Looks good to me - Greg, would you mind queueing this up for > 5.14-stable? 5.14 is end-of-life and not getting any more releases (the front page of kernel.org should show that.) If this needs to go anywhere else, please let me know. thanks, greg k-h ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-24 16:18 ` Greg Kroah-Hartman @ 2021-11-24 16:22 ` Jens Axboe 2021-11-24 22:52 ` Stefan Metzmacher 2021-11-24 22:57 ` Daniel Black 0 siblings, 2 replies; 35+ messages in thread From: Jens Axboe @ 2021-11-24 16:22 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Daniel Black, Salvatore Bonaccorso, Pavel Begunkov, linux-block, io-uring, stable On 11/24/21 9:18 AM, Greg Kroah-Hartman wrote: > On Wed, Nov 24, 2021 at 09:10:25AM -0700, Jens Axboe wrote: >> On 11/24/21 8:28 AM, Jens Axboe wrote: >>> On 11/23/21 8:27 PM, Daniel Black wrote: >>>> On Mon, Nov 15, 2021 at 7:55 AM Jens Axboe <[email protected]> wrote: >>>>> >>>>> On 11/14/21 1:33 PM, Daniel Black wrote: >>>>>> On Fri, Nov 12, 2021 at 10:44 AM Jens Axboe <[email protected]> wrote: >>>>>>> >>>>>>> Alright, give this one a go if you can. Against -git, but will apply to >>>>>>> 5.15 as well. >>>>>> >>>>>> >>>>>> Works. Thank you very much. >>>>>> >>>>>> https://jira.mariadb.org/browse/MDEV-26674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=205599#comment-205599 >>>>>> >>>>>> Tested-by: Marko Mäkelä <[email protected]> >>>>> >>>>> The patch is already upstream (and in the 5.15 stable queue), and I >>>>> provided 5.14 patches too. >>>> >>>> Jens, >>>> >>>> I'm getting the same reproducer on 5.14.20 >>>> (https://bugzilla.redhat.com/show_bug.cgi?id=2018882#c3) though the >>>> backport change logs indicate 5.14.19 has the patch. >>>> >>>> Anything missing? >>> >>> We might also need another patch that isn't in stable, I'm attaching >>> it here. Any chance you can run 5.14.20/21 with this applied? If not, >>> I'll do some sanity checking here and push it to -stable. >> >> Looks good to me - Greg, would you mind queueing this up for >> 5.14-stable? > > 5.14 is end-of-life and not getting any more releases (the front page of > kernel.org should show that.) Oh, well I guess that settles that... > If this needs to go anywhere else, please let me know. Should be fine, previous 5.10 isn't affected and 5.15 is fine too as it already has the patch. -- Jens Axboe ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-24 16:22 ` Jens Axboe @ 2021-11-24 22:52 ` Stefan Metzmacher 2021-11-25 0:58 ` Jens Axboe 2021-11-24 22:57 ` Daniel Black 1 sibling, 1 reply; 35+ messages in thread From: Stefan Metzmacher @ 2021-11-24 22:52 UTC (permalink / raw) To: Jens Axboe, Greg Kroah-Hartman Cc: Daniel Black, Salvatore Bonaccorso, Pavel Begunkov, linux-block, io-uring, stable Hi Jens, >>> Looks good to me - Greg, would you mind queueing this up for >>> 5.14-stable? >> >> 5.14 is end-of-life and not getting any more releases (the front page of >> kernel.org should show that.) > > Oh, well I guess that settles that... > >> If this needs to go anywhere else, please let me know. > > Should be fine, previous 5.10 isn't affected and 5.15 is fine too as it > already has the patch. Are 5.11 and 5.13 are affected, these are hwe kernels for ubuntu, I may need to open a bug for them... Thanks! metze ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-24 22:52 ` Stefan Metzmacher @ 2021-11-25 0:58 ` Jens Axboe 2021-11-25 16:35 ` Stefan Metzmacher 0 siblings, 1 reply; 35+ messages in thread From: Jens Axboe @ 2021-11-25 0:58 UTC (permalink / raw) To: Stefan Metzmacher, Greg Kroah-Hartman Cc: Daniel Black, Salvatore Bonaccorso, Pavel Begunkov, linux-block, io-uring, stable On 11/24/21 3:52 PM, Stefan Metzmacher wrote: > Hi Jens, > >>>> Looks good to me - Greg, would you mind queueing this up for >>>> 5.14-stable? >>> >>> 5.14 is end-of-life and not getting any more releases (the front page of >>> kernel.org should show that.) >> >> Oh, well I guess that settles that... >> >>> If this needs to go anywhere else, please let me know. >> >> Should be fine, previous 5.10 isn't affected and 5.15 is fine too as it >> already has the patch. > > Are 5.11 and 5.13 are affected, these are hwe kernels for ubuntu, > I may need to open a bug for them... Please do, then we can help get the appropriate patches lined up for 5.11/13. They should need the same set, basically what ended up in 5.14 plus the one I posted today. -- Jens Axboe ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-25 0:58 ` Jens Axboe @ 2021-11-25 16:35 ` Stefan Metzmacher 2021-11-25 17:11 ` Jens Axboe 2022-02-09 23:01 ` Stefan Metzmacher 0 siblings, 2 replies; 35+ messages in thread From: Stefan Metzmacher @ 2021-11-25 16:35 UTC (permalink / raw) To: Jens Axboe, Greg Kroah-Hartman Cc: Daniel Black, Salvatore Bonaccorso, Pavel Begunkov, linux-block, io-uring, stable Am 25.11.21 um 01:58 schrieb Jens Axboe: > On 11/24/21 3:52 PM, Stefan Metzmacher wrote: >> Hi Jens, >> >>>>> Looks good to me - Greg, would you mind queueing this up for >>>>> 5.14-stable? >>>> >>>> 5.14 is end-of-life and not getting any more releases (the front page of >>>> kernel.org should show that.) >>> >>> Oh, well I guess that settles that... >>> >>>> If this needs to go anywhere else, please let me know. >>> >>> Should be fine, previous 5.10 isn't affected and 5.15 is fine too as it >>> already has the patch. >> >> Are 5.11 and 5.13 are affected, these are hwe kernels for ubuntu, >> I may need to open a bug for them... > > Please do, then we can help get the appropriate patches lined up for > 5.11/13. They should need the same set, basically what ended up in 5.14 > plus the one I posted today. Ok, I've created https://bugs.launchpad.net/bugs/1952222 Let's see what happens... metze ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-25 16:35 ` Stefan Metzmacher @ 2021-11-25 17:11 ` Jens Axboe 2022-02-09 23:01 ` Stefan Metzmacher 1 sibling, 0 replies; 35+ messages in thread From: Jens Axboe @ 2021-11-25 17:11 UTC (permalink / raw) To: Stefan Metzmacher, Greg Kroah-Hartman Cc: Daniel Black, Salvatore Bonaccorso, Pavel Begunkov, linux-block, io-uring, stable On 11/25/21 9:35 AM, Stefan Metzmacher wrote: > Am 25.11.21 um 01:58 schrieb Jens Axboe: >> On 11/24/21 3:52 PM, Stefan Metzmacher wrote: >>> Hi Jens, >>> >>>>>> Looks good to me - Greg, would you mind queueing this up for >>>>>> 5.14-stable? >>>>> >>>>> 5.14 is end-of-life and not getting any more releases (the front page of >>>>> kernel.org should show that.) >>>> >>>> Oh, well I guess that settles that... >>>> >>>>> If this needs to go anywhere else, please let me know. >>>> >>>> Should be fine, previous 5.10 isn't affected and 5.15 is fine too as it >>>> already has the patch. >>> >>> Are 5.11 and 5.13 are affected, these are hwe kernels for ubuntu, >>> I may need to open a bug for them... >> >> Please do, then we can help get the appropriate patches lined up for >> 5.11/13. They should need the same set, basically what ended up in 5.14 >> plus the one I posted today. > > Ok, I've created https://bugs.launchpad.net/bugs/1952222 > > Let's see what happens... Let me know if I can help, should probably prepare a set for 5.11-stable and 5.13-stable, but I don't know if the above kernels already have some patches applied past last stable release of each... -- Jens Axboe ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-25 16:35 ` Stefan Metzmacher 2021-11-25 17:11 ` Jens Axboe @ 2022-02-09 23:01 ` Stefan Metzmacher 2022-02-10 0:10 ` Daniel Black 1 sibling, 1 reply; 35+ messages in thread From: Stefan Metzmacher @ 2022-02-09 23:01 UTC (permalink / raw) To: Jens Axboe, Greg Kroah-Hartman Cc: Daniel Black, Salvatore Bonaccorso, Pavel Begunkov, linux-block, io-uring, stable Hi Jens, >>>>>> Looks good to me - Greg, would you mind queueing this up for >>>>>> 5.14-stable? >>>>> >>>>> 5.14 is end-of-life and not getting any more releases (the front page of >>>>> kernel.org should show that.) >>>> >>>> Oh, well I guess that settles that... >>>> >>>>> If this needs to go anywhere else, please let me know. >>>> >>>> Should be fine, previous 5.10 isn't affected and 5.15 is fine too as it >>>> already has the patch. >>> >>> Are 5.11 and 5.13 are affected, these are hwe kernels for ubuntu, >>> I may need to open a bug for them... >> >> Please do, then we can help get the appropriate patches lined up for >> 5.11/13. They should need the same set, basically what ended up in 5.14 >> plus the one I posted today. > > Ok, I've created https://bugs.launchpad.net/bugs/1952222 At least for 5.14 the patch is included in https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-oem/+git/focal/log/?h=Ubuntu-oem-5.14-5.14.0-1023.25 https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-oem/+git/focal/commit/?h=Ubuntu-oem-5.14-5.14.0-1023.25&id=9e2b95e7c9dd103297e6a3ccd98a7bf11ef66921 apt-get install -V -t focal-proposed linux-oem-20.04d linux-tools-oem-20.04d installs linux-image-5.14.0-1023-oem (5.14.0-1023.25) Do we have any reproducer I can use to reproduce the problem and demonstrate the bug if fixed? Thanks! metze ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2022-02-09 23:01 ` Stefan Metzmacher @ 2022-02-10 0:10 ` Daniel Black 0 siblings, 0 replies; 35+ messages in thread From: Daniel Black @ 2022-02-10 0:10 UTC (permalink / raw) To: Stefan Metzmacher Cc: Jens Axboe, Greg Kroah-Hartman, Salvatore Bonaccorso, Pavel Begunkov, linux-block, io-uring, stable Stefan, On Thu, Feb 10, 2022 at 10:01 AM Stefan Metzmacher <[email protected]> wrote: > > Ok, I've created https://bugs.launchpad.net/bugs/1952222 > > At least for 5.14 the patch is included in > > https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-oem/+git/focal/log/?h=Ubuntu-oem-5.14-5.14.0-1023.25 > > https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-oem/+git/focal/commit/?h=Ubuntu-oem-5.14-5.14.0-1023.25&id=9e2b95e7c9dd103297e6a3ccd98a7bf11ef66921 > > apt-get install -V -t focal-proposed linux-oem-20.04d linux-tools-oem-20.04d > installs linux-image-5.14.0-1023-oem (5.14.0-1023.25) Thanks! > Do we have any reproducer I can use to reproduce the problem > and demonstrate the bug if fixed? > The original container and test from https://lore.kernel.org/linux-block/CABVffEOpuViC9OyOuZg28sRfGK4GRc8cV0CnkOU2cM0RJyRhPw@mail.gmail.com/ will be sufficient. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: uring regression - lost write request 2021-11-24 16:22 ` Jens Axboe 2021-11-24 22:52 ` Stefan Metzmacher @ 2021-11-24 22:57 ` Daniel Black 1 sibling, 0 replies; 35+ messages in thread From: Daniel Black @ 2021-11-24 22:57 UTC (permalink / raw) To: Jens Axboe Cc: Greg Kroah-Hartman, Salvatore Bonaccorso, Pavel Begunkov, linux-block, io-uring, stable On Thu, Nov 25, 2021 at 3:22 AM Jens Axboe <[email protected]> wrote: > > On 11/24/21 9:18 AM, Greg Kroah-Hartman wrote: > > On Wed, Nov 24, 2021 at 09:10:25AM -0700, Jens Axboe wrote: > >> On 11/24/21 8:28 AM, Jens Axboe wrote: > >>> On 11/23/21 8:27 PM, Daniel Black wrote: > >>>> On Mon, Nov 15, 2021 at 7:55 AM Jens Axboe <[email protected]> wrote: > >>>> I'm getting the same reproducer on 5.14.20 > >>>> (https://bugzilla.redhat.com/show_bug.cgi?id=2018882#c3) though the > >>>> backport change logs indicate 5.14.19 has the patch. > >>>> > >>>> Anything missing? > >>> > >>> We might also need another patch that isn't in stable, I'm attaching > >>> it here. Any chance you can run 5.14.20/21 with this applied? If not, > >>> I'll do some sanity checking here and push it to -stable. > >> > >> Looks good to me - Greg, would you mind queueing this up for > >> 5.14-stable? > > > > 5.14 is end-of-life and not getting any more releases (the front page of > > kernel.org should show that.) > > Oh, well I guess that settles that... Certainly does. Thanks for looking and finding the patch. > > If this needs to go anywhere else, please let me know. > > Should be fine, previous 5.10 isn't affected and 5.15 is fine too as it > already has the patch. Thank you https://github.com/MariaDB/server/commit/de7db5517de11a58d57d2a41d0bc6f38b6f92dd8 On Thu, Nov 25, 2021 at 9:52 AM Stefan Metzmacher <[email protected]> wrote: > Are 5.11 and 5.13 are affected, Yes. > these are hwe kernels for ubuntu, > I may need to open a bug for them... Yes please. ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2022-02-10 2:05 UTC | newest] Thread overview: 35+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CABVffENnJ8JkP7EtuUTqi+VkJDBFU37w1UXe4Q3cB7-ixxh0VA@mail.gmail.com> 2021-10-22 9:10 ` uring regression - lost write request Pavel Begunkov 2021-10-25 9:57 ` Pavel Begunkov 2021-10-25 11:09 ` Daniel Black 2021-10-25 11:25 ` Pavel Begunkov 2021-10-30 7:30 ` Salvatore Bonaccorso 2021-11-01 7:28 ` Daniel Black 2021-11-09 22:58 ` Daniel Black 2021-11-09 23:24 ` Jens Axboe 2021-11-10 18:01 ` Jens Axboe 2021-11-11 6:52 ` Daniel Black 2021-11-11 14:30 ` Jens Axboe 2021-11-11 14:58 ` Jens Axboe 2021-11-11 15:29 ` Jens Axboe 2021-11-11 16:19 ` Jens Axboe 2021-11-11 16:55 ` Jens Axboe 2021-11-11 17:28 ` Jens Axboe 2021-11-11 23:44 ` Jens Axboe 2021-11-12 6:25 ` Daniel Black 2021-11-12 19:19 ` Salvatore Bonaccorso 2021-11-14 20:33 ` Daniel Black 2021-11-14 20:55 ` Jens Axboe 2021-11-14 21:02 ` Salvatore Bonaccorso 2021-11-14 21:03 ` Jens Axboe 2021-11-24 3:27 ` Daniel Black 2021-11-24 15:28 ` Jens Axboe 2021-11-24 16:10 ` Jens Axboe 2021-11-24 16:18 ` Greg Kroah-Hartman 2021-11-24 16:22 ` Jens Axboe 2021-11-24 22:52 ` Stefan Metzmacher 2021-11-25 0:58 ` Jens Axboe 2021-11-25 16:35 ` Stefan Metzmacher 2021-11-25 17:11 ` Jens Axboe 2022-02-09 23:01 ` Stefan Metzmacher 2022-02-10 0:10 ` Daniel Black 2021-11-24 22:57 ` Daniel Black
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox