如何用 SQL 求出「都关注了某100人」和「都收藏了某100帖子」的所有用户?

回复列表(165|隐藏机器人聊天)
  • @Ta / 2022-08-15 / /

    @老虎会游泳,补充测试完了

    修改了下结果列表,增加了方法二耗时对比

    基本结论

    1. 加不加DISTINCT耗时都一样
    2. 新增user表,再将方法一换成你的方法后,耗时更像方法二了。。

    结果

    SQL序号 结果 第一次耗时/s 第二次耗时/s 方法二第一次耗时/s 方法二第二次耗时/s
    1 61270 32.388 1.101 33.373 2.753
    2 333 65.031 1.728 65.338 3.485
    3 367 65.115 1.722 65.338 3.485
    4 2 104.504 2.343 98.5 4.954

    用到的SQL

    0. 新增user

    CREATE TABLE u(uid INTEGER PRIMARY KEY) WITHOUT ROWID;
    INSERT INTO u SELECT value FROM generate_series(1, 10000000);
    

    1. 每表1个参数,按照你的SQL测试

    SELECT count(*) FROM u
    WHERE uid IN (SELECT uid FROM a WHERE num = 50)
      AND uid IN (SELECT uid FROM b WHERE num = 100)
    

    2. 每表2个参数,按照你的SQL测试

    SELECT count(*) FROM u
    WHERE uid IN (SELECT uid FROM a WHERE num = 10)
      AND uid IN (SELECT uid FROM a WHERE num = 20)
      AND uid IN (SELECT uid FROM b WHERE num = 60)
      AND uid IN (SELECT uid FROM b WHERE num = 70)
    

    3. 每表2个参数,添加DISTINCT

    SELECT count(*) FROM u
    WHERE uid IN (SELECT DISTINCT uid FROM a WHERE num = 30)
      AND uid IN (SELECT DISTINCT uid FROM a WHERE num = 40)
      AND uid IN (SELECT DISTINCT uid FROM b WHERE num = 80)
      AND uid IN (SELECT DISTINCT uid FROM b WHERE num = 90)
    

    4. 每表3个参数,按照你的SQL测试

    SELECT count(*) FROM u
    WHERE uid IN (SELECT uid FROM a WHERE num = 200)
      AND uid IN (SELECT uid FROM a WHERE num = 210)
      AND uid IN (SELECT uid FROM a WHERE num = 220)
      AND uid IN (SELECT uid FROM b WHERE num = 230)
      AND uid IN (SELECT uid FROM b WHERE num = 240)
      AND uid IN (SELECT uid FROM b WHERE num = 250)
    
  • @Ta / 2022-08-16 / /

    @无名啊,嗯,所以我们能不能说,拼接JOIN SELECT WHERE语句是最佳解决方案?

    SELECT count(*)
      FROM (SELECT uid FROM a WHERE num = 10)
      JOIN (SELECT uid FROM b WHERE num = 60) USING(uid)
      JOIN (SELECT uid FROM a WHERE num = 20) USING(uid)
      JOIN (SELECT uid FROM b WHERE num = 70) USING(uid)
    
  • @Ta / 2022-08-16 / /

    @老虎会游泳

    嗯,所以我们能不能说,拼接JOIN SELECT WHERE语句是最佳解决方案?

    目前看来,SQLite上成立

    所以在“员工信息表”里不断排除范围可能比让其他表互相做JOIN更快。

    我原本也是这个思路:根据num取得uid,然后不断缩小uid数量

    但仔细想了想,觉得目前这个帖子里,没有一个SQL是按照这个思路执行的。

    因为,所有的表结构,都是(num, uid)。上一步积累的uid,没法用到下一步的过滤

    (我觉得B表可以改成(uid, num),来利用上A表查到的uid结果列表)

    (我再想想。。)

    不这么干,而直接全部变成(uid, num),就等着扫全表,来一个个用户检查“是否有num”吧

    SQLite倒是有个“跳跃扫描”,可以在一定程度上不遵循“最左原则”。但这样也是 1kw 次 uid: 1~1kw, num = ? 次匹配,和扫全表比,谁快还真的说不定)

  • @Ta / 2022-08-16 / /
    1.现在的sqlite3、postgresql都有json类型,直接内嵌结构,很多场景不需要链表。
    2.需要链表的地方,基本都有现成的数据,不需要去查那么一下。
    3.拆为代码层面多次查询,有助于封装成功能函数,添加缓存啥的,避免无所谓的重复代码和查询。
  • @Ta / 2022-08-17 / /

    @爱特

    1. 链表是啥?这样么?

      主键ID uid num 下一节点ID
      1 1 1 3
      2 2 1 0
      3 1 2 0
    2. 还是没懂你的“链表”,等你回复

    3. “添加缓存”是指啥?如:num = 1时,有uid: [1, 2]

  • @Ta / 2022-08-17 / /
    1.链表就是联表,因为我经常输入链表,输入法匹配到第一个就是这个。
    2.如果仅是这个需求,我会直接查询字段加索引,直接num = 1解决。
    3.实际上几乎所有增量数据都会涉及时间,具体业务具体实现,首次查询缓存,后续事件方式更新缓存,不需要再操作数据库。
    4.当然,你给的这个表结构,纯int,基本也不会有性能问题,怎么搞都一样,如果这个num相当于类型是动态的,我会直接json字段解决问题。
    不回复了,纠结这个意义不大,真遇到性能问题,十有八九都是冗余字段、事件总线方式去解决,我也是mongo用习惯了,用回sql也是经常使用json字段嵌结构进去,贼方便。
添加新回复
回复需要登录