已掉线,重新登录

首页 > 绿虎论坛 > 历史版块 > 编程 > 数据库

如何用 SQL 求出「都关注了某100人」和「都收藏了某100帖子」的所有用户?


『回复列表(165|隐藏机器人聊天)』

160.

@老虎会游泳,补充测试完了

修改了下结果列表,增加了方法二耗时对比

基本结论

  1. 加不加DISTINCT耗时都一样
  2. 新增user表,再将方法一换成你的方法后,耗时更像方法二了。。

结果

SQL序号 结果 第一次耗时/s 第二次耗时/s 方法二第一次耗时/s 方法二第二次耗时/s
1 61270 32.388 1.101 33.373 2.753
2 333 65.031 1.728 65.338 3.485
3 367 65.115 1.722 65.338 3.485
4 2 104.504 2.343 98.5 4.954

用到的SQL

0. 新增user

CREATE TABLE u(uid INTEGER PRIMARY KEY) WITHOUT ROWID;
INSERT INTO u SELECT value FROM generate_series(1, 10000000);

1. 每表1个参数,按照你的SQL测试

SELECT count(*) FROM u
WHERE uid IN (SELECT uid FROM a WHERE num = 50)
  AND uid IN (SELECT uid FROM b WHERE num = 100)

2. 每表2个参数,按照你的SQL测试

SELECT count(*) FROM u
WHERE uid IN (SELECT uid FROM a WHERE num = 10)
  AND uid IN (SELECT uid FROM a WHERE num = 20)
  AND uid IN (SELECT uid FROM b WHERE num = 60)
  AND uid IN (SELECT uid FROM b WHERE num = 70)

3. 每表2个参数,添加DISTINCT

SELECT count(*) FROM u
WHERE uid IN (SELECT DISTINCT uid FROM a WHERE num = 30)
  AND uid IN (SELECT DISTINCT uid FROM a WHERE num = 40)
  AND uid IN (SELECT DISTINCT uid FROM b WHERE num = 80)
  AND uid IN (SELECT DISTINCT uid FROM b WHERE num = 90)

4. 每表3个参数,按照你的SQL测试

SELECT count(*) FROM u
WHERE uid IN (SELECT uid FROM a WHERE num = 200)
  AND uid IN (SELECT uid FROM a WHERE num = 210)
  AND uid IN (SELECT uid FROM a WHERE num = 220)
  AND uid IN (SELECT uid FROM b WHERE num = 230)
  AND uid IN (SELECT uid FROM b WHERE num = 240)
  AND uid IN (SELECT uid FROM b WHERE num = 250)
(/@Ta/2022-08-15 23:08//)

161.

@无名啊,嗯,所以我们能不能说,拼接JOIN SELECT WHERE语句是最佳解决方案?

SELECT count(*)
  FROM (SELECT uid FROM a WHERE num = 10)
  JOIN (SELECT uid FROM b WHERE num = 60) USING(uid)
  JOIN (SELECT uid FROM a WHERE num = 20) USING(uid)
  JOIN (SELECT uid FROM b WHERE num = 70) USING(uid)
(/@Ta/2022-08-16 00:54//)

162.

@老虎会游泳

嗯,所以我们能不能说,拼接JOIN SELECT WHERE语句是最佳解决方案?

目前看来,SQLite上成立

所以在“员工信息表”里不断排除范围可能比让其他表互相做JOIN更快。

我原本也是这个思路:根据num取得uid,然后不断缩小uid数量

但仔细想了想,觉得目前这个帖子里,没有一个SQL是按照这个思路执行的。

因为,所有的表结构,都是(num, uid)。上一步积累的uid,没法用到下一步的过滤

(我觉得B表可以改成(uid, num),来利用上A表查到的uid结果列表)

(我再想想。。)

不这么干,而直接全部变成(uid, num),就等着扫全表,来一个个用户检查“是否有num”吧

SQLite倒是有个“跳跃扫描”,可以在一定程度上不遵循“最左原则”。但这样也是 1kw 次 uid: 1~1kw, num = ? 次匹配,和扫全表比,谁快还真的说不定)

(/@Ta/2022-08-16 02:02//)

163. 1.现在的sqlite3、postgresql都有json类型,直接内嵌结构,很多场景不需要链表。
2.需要链表的地方,基本都有现成的数据,不需要去查那么一下。
3.拆为代码层面多次查询,有助于封装成功能函数,添加缓存啥的,避免无所谓的重复代码和查询。
(/@Ta/2022-08-16 11:05//)

164.

@爱特

  1. 链表是啥?这样么?

    主键ID uid num 下一节点ID
    1 1 1 3
    2 2 1 0
    3 1 2 0
  2. 还是没懂你的“链表”,等你回复

  3. “添加缓存”是指啥?如:num = 1时,有uid: [1, 2]

(/@Ta/2022-08-17 01:00//)

165. 1.链表就是联表,因为我经常输入链表,输入法匹配到第一个就是这个。
2.如果仅是这个需求,我会直接查询字段加索引,直接num = 1解决。
3.实际上几乎所有增量数据都会涉及时间,具体业务具体实现,首次查询缓存,后续事件方式更新缓存,不需要再操作数据库。
4.当然,你给的这个表结构,纯int,基本也不会有性能问题,怎么搞都一样,如果这个num相当于类型是动态的,我会直接json字段解决问题。
不回复了,纠结这个意义不大,真遇到性能问题,十有八九都是冗余字段、事件总线方式去解决,我也是mongo用习惯了,用回sql也是经常使用json字段嵌结构进去,贼方便。
(/@Ta/2022-08-17 11:31//)

上一页 9/9页,共165楼

回复需要登录

10月7日 21:50 星期二

本站由hu60wap6驱动

备案号: 京ICP备18041936号-1