From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.9 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E9EEC433E1 for ; Thu, 4 Jun 2020 01:30:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D9A2820772 for ; Thu, 4 Jun 2020 01:30:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=anarazel.de header.i=@anarazel.de header.b="cIwUViir"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="S8YKghCe" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726050AbgFDBas (ORCPT ); Wed, 3 Jun 2020 21:30:48 -0400 Received: from out1-smtp.messagingengine.com ([66.111.4.25]:45699 "EHLO out1-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725983AbgFDBas (ORCPT ); Wed, 3 Jun 2020 21:30:48 -0400 Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id 11BAD5C0106; Wed, 3 Jun 2020 21:30:47 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Wed, 03 Jun 2020 21:30:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= date:from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=fm1; bh=4vipdXWneTcdQDAu6JbWhyQeuEq wMDjkliQKHSJwEH8=; b=cIwUViirQ9aI81E1h0bKidmTFtjoZ3LThdQfLivnDXP 2btgd5UtvVfYduXrqevolSErnEYP2HZnQbdnc61HbLzpZdUJc/RvgQpiMGH+bOlb 6Ivkzm0h2a4JS6qL+aRABncNgBWPKrHvbb+lkJblhT43dkmDPk1Ddf5MBjZdf+BH ww74xt8wIMuMw8/aBBQUF0FM3jsARRUK96SqIA+AoX929SoDHZb910EXEchb/2nf JaLfrJPWdHt9vuWjtQGARoNXfeE/4QaoMaUi5MxH0tGt7plBif2Yi7tUHTIZ52rC EdsfjCMdnHkVk8KWGWBk4XwevOM1KbHD+PVZEbs+xDA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=4vipdX WneTcdQDAu6JbWhyQeuEqwMDjkliQKHSJwEH8=; b=S8YKghCehYdlM9NUVAcy4B VSjw0fBXZl7sAnCD5BS8WoUavWx0V3IfVv39aT+ViY8jH3ezW1RJPupIUm5m6NSZ 4Obj2dJVtVZBSWrh89ONFFxa2U9F833qfyVEnAagdSAF2oryyD4YAMnPYRdF68F/ nCHzGOnlAx1uXOAgTsfoJPiLGxHC0kMTdBPQlUkoIc050N4blHTk6tlPutvHLlhA dGAbC8V54HSB+ElKNShotd9H9NiI0g8ghKxf/PaKkRJ+LmXkm2PYenjhrAPwZM73 SYx3r+yNZEd0unJs0GEfXMpoorV+t+QJmIp9WwtIp4Uig3F2VGGPx821uhu1Z48A == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduhedrudegtddggeehucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvffukfhfgggtuggjsehttdertddttddvnecuhfhrohhmpeetnhgurhgv shcuhfhrvghunhguuceorghnughrvghssegrnhgrrhgriigvlhdruggvqeenucggtffrrg htthgvrhhnpedukefhkeelueegveetheelffffjeegleeuudelfeefuedtleffueejfffh ueffudenucfkphepieejrdduiedtrddvudejrddvhedtnecuvehluhhsthgvrhfuihiivg eptdenucfrrghrrghmpehmrghilhhfrhhomheprghnughrvghssegrnhgrrhgriigvlhdr uggv X-ME-Proxy: Received: from intern.anarazel.de (c-67-160-217-250.hsd1.ca.comcast.net [67.160.217.250]) by mail.messagingengine.com (Postfix) with ESMTPA id 504183280065; Wed, 3 Jun 2020 21:30:46 -0400 (EDT) Date: Wed, 3 Jun 2020 18:30:45 -0700 From: Andres Freund To: Jens Axboe Cc: io-uring@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org Subject: Re: [PATCHSET v5 0/12] Add support for async buffered reads Message-ID: <20200604013045.7gu7xopreusbdea2@alap3.anarazel.de> References: <20200526195123.29053-1-axboe@kernel.dk> <20200604005916.niy2mejjcsx4sv6t@alap3.anarazel.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: io-uring-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Hi, On 2020-06-03 19:04:17 -0600, Jens Axboe wrote: > > The workload that triggers the bug within a few seconds is postgres > > doing a parallel sequential scan of a large table (and aggregating the > > data, but that shouldn't matter). In the triggering case that boils down > > to 9 processes sequentially reading a number of 1GB files (we chunk > > tables internally into smaller files). Each process will read a 512kB > > chunk of the file on its own, and then claim the next 512kB from a > > shared memory location. Most of the IO will be READV requests, reading > > 16 * 8kB into postgres' buffer pool (which may or may not be neighboring > > 8kB pages). > > I'll try and reproduce this, any chance you have a test case that can > be run so I don't have to write one from scratch? The more detailed > instructions the better. It shouldn't be too hard to write you a detailed script for reproducing the issue. But it'd not be an all that minimal reproducer, unless it also triggers on smaller scale (it's a 130GB database that triggers the problem reliably, and small tables don't seem to do so reliably). I'll try to write that up after I set up kvm / repro there. One thing I forgot in the earlier email: I ran the benchmark using 'perf stat -a -e ...'. I'm fairly, but not absolutely, certain that it also triggered without that. I don't think it's related, but I thought I better mention it. > I have a known issue with request starvation, wonder if that could be it. > I'm going to rebase the branch on top of the aops->readahead() changes > shortly, and fix that issue. Hopefully that's what's plaguing your run > here, but if not, I'll hunt that one down. FWIW, I had iostat -xm /dev/nvme1n1 1 running during this. Shortly before the crash I see: Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme1n1 6221.00 956.09 3428.00 35.53 0.24 157.38 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.48 99.00 Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme1n1 6456.00 978.83 3439.00 34.75 0.21 155.25 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.38 98.70 It's maybe also worth noting that in this workload the results are *worse* than when using 5.7-rc7 io_uring. So perhaps request starvation isn't the worst guess... Greetings, Andres Freund