NoSQL数据库:MongoDB初探_动视

NoSQL数据库:MongoDB初探

2020-11-09 07:21:56 责编:小采

跟着时下炒得火热的NOSQL潮流，学习了一下mongodb，记录在此，希望与感兴趣的同学一起研究！ MongoDB概述 mongodb由C＋＋写就，其名字来自hu mongo us这个单词的中间部分，是由10gen开发并维护的,关于它的一个最简洁描述为：scalable, high-performance, ope

跟着时下炒得火热的NOSQL潮流，学习了一下mongodb，记录在此，希望与感兴趣的同学一起研究！

MongoDB概述

mongodb由C＋＋写就，其名字来自humongous这个单词的中间部分，是由10gen开发并维护的,关于它的一个最简洁描述为：scalable, high-performance, open source, schema-free, document-oriented database。MongoDB的主要目标是在键/值存储方式（提供了高性能和高度伸缩性）以及传统的RDBMS系统（丰富的功能）架起一座桥梁，集两者的优势于一身。

MongoDB特性：

l 面向文档存储

l 全索引支持,扩展到内部对象和内嵌数组

l 复制和高可用

l 自动分片支持云级扩展性

l 查询记录分析

l 动态查询

l 快速,就地更新

l 支持Map/Reduce操作

l GridFS文件系统

l 商业支持,培训和咨询

官网: http://www.mongodb.org/

配置

Master-slaves 模式

机器	IP	角色
test001	192.168.1.1	master
test002	192.168.1.2	slave
test003	192.168.1.3	slave
test004	192.168.1.4	slave
test005	192.168.1.5	slave
test006	192.168.1.6	slave

启动master:

1	`./mongod -dbpath=/mongodb/data/ -logpath=/mongodb/logs/mongodb.log -oplogSize=10000 -logappend -master -port=27017 -fork`

添加repl用户:

./mongo

>use local

> db.addUser('repl','replication');

启动slaves:

1 2	`./mongod -dbpath=/mongodb/data/ -logpath=/mongodb/logs/mongodb.log -slave -port=27017 -source=test001:27017 --autoresync` `-fork`

添加repl用户:

./mongo

>use local

> db.addUser('repl','replication');

autoresync 参数会在系统发生意外情况造成主从数据不同步时，自动启动复制操作 (同步复制 10 分钟内仅执行一次)。除此之外，还可以用 –slavedelay 设定更新频率(秒)。

通常我们会使用主从方案实现读写分离，但需要设置 Slave_OK。

slaveOk

When querying a replica pair or replica set, drivers route their requests to the master mongod by default; to perform a query against an (arbitrarily-selected) slave, the query can be run with the slaveOk option. Here’s how to do so in the shell:

db.getMongo().setSlaveOk(); // enable querying a slave
db.users.find(...)

Note: some language drivers permit specifying the slaveOk option on each find(), others make this a connection-wide setting. See your language’s driver for details.

Replica Set模式

Replica Sets 使用 n 个 Mongod 节点，构建具备自动容错转移(auto-failover)、自动恢复(auto-recovery) 的高可用方案。

机器	IP	角色
test001	192.168.1.1	secondary
test002	192.168.1.2	secondary
test003	192.168.1.3	primary
test004	192.168.1.4	secondary
test005	192.168.1.5	secondary
test006	192.168.1.6	secondary
test007	192.168.1.7	secondary

启动:

1	`./mongod -dbpath=/mongodb/data/ -logpath=/mongodb/logs/mongodb.log -oplogSize=10000 -logappend -replSet set1 -port=27017 -fork –rest`

添加repl用户:

./mongo

>use local

> db.addUser('repl','replication');

配置:

config={_id:'set1',members:[

{_id:0,host:'test001:27017'},

{_id:1,host:'test002:27017'},

{_id:2,host:'test003:27017'},

{_id:3,host:'test004:27017'},

{_id:4,host:'test005:27017'},

{_id:5,host:'test006:27017'},

{_id:6,host:'test007:27017'}]

}

rs.initiate(config);

查看:

访问 http://test001 :28017/_replSet

或者

./mongo

> rs.status()

{

"set" : "set1",

"date" : "Fri Dec 03 2010 00:57:44 GMT+0800 (CST)",

"myState" : 2,

"members" : [

{

"_id" : 0,

"name" : "test001:27017",

"health" : 1,

"state" : 2,

"self" : true

},

{

"_id" : 1,

"name" : "test002:27017",

"health" : 1,

"state" : 2,

"uptime" : 194451,

"lastHeartbeat" : "Fri Dec 03 2010 00:57:42 GMT+0800 (CST)"

},

{

"_id" : 2,

"name" : "test003:27017",

"health" : 1,

"state" : 1,

"uptime" : 1946,

"lastHeartbeat" : "Fri Dec 03 2010 00:57:43 GMT+0800 (CST)"

},

{

"_id" : 3,

"name" : "test004:27017",

"health" : 1,

"state" : 2,

"uptime" : 1946,

"lastHeartbeat" : "Fri Dec 03 2010 00:57:42 GMT+0800 (CST)"

},

{

"_id" : 4,

"name" : "test005:27017",

"health" : 1,

"state" : 2,

"uptime" : 1946,

"lastHeartbeat" : "Fri Dec 03 2010 00:57:42 GMT+0800 (CST)"

},

{

"_id" : 5,

"name" : "test006:27017",

"health" : 1,

"state" : 2,

"uptime" : 1946,

"lastHeartbeat" : "Fri Dec 03 2010 00:57:43 GMT+0800 (CST)"

},

{

"_id" : 6,

"name" : "test007:27017",

"health" : 1,

"state" : 2,

"uptime" : 1946,

"lastHeartbeat" : "Fri Dec 03 2010 00:57:42 GMT+0800 (CST)"

}

],

"ok" : 1

}

在Replica Sets上做操作后调用getlasterror使写操作同步到至少3台机器后才返回

db.runCommand( { getlasterror : 1 , w : 3 } )

注：该模式不支持auth功能，需要auth功能请选择m-s模式

Sharding模式

要构建一个 MongoDB Sharding Cluster，需要三种角色：

Shard Server: mongod 实例，用于存储实际的数据块。

Config Server: mongod 实例，存储了整个 Cluster Metadata，其中包括 chunk 信息。

Route Server: mongos 实例，前端路由，客户端由此接入，且让整个集群看上去像单一进程数据库。

机器	IP	角色
test002	192.168.1.2	mongod shard11:27017
test003	192.168.1.3	mongod shard21:27017
test004	192.168.1.4	mongod shard31:27017
test005	192.168.1.5	mongod config1:20000 mongs1:30000
test006	192.168.1.6	mongod config2:20000 mongs2:30000
test007	192.168.1.7	mongod config3:20000 mongs3:30000
test008	192.168.1.8	mongod shard12:27017
test009	192.168.1.9	mongod shard22:27017
test010	192.168.1.10	mongod shard32:27017

Shard配置

Shard1

[test002; test008]

test002:

1	`./mongod -shardsvr -replSet shard1 -port 27017 -dbpath /mongodb/data/shard11 -oplogSize 10000 -logpath /mongodb/logs/shard11.log -logappend -fork`

test008:

1	`./mongod -shardsvr -replSet shard1 -port 27017 -dbpath /mongodb/data/shard12 -oplogSize 10000 -logpath /mongodb/logs/shard12.log -logappend -fork`

初始化shard1

config={_id:'shard1',members:[

{_id:0,host:'test002:27017'},

{_id:1,host:'test008:27017'}]

}

rs.initiate(config);

Shard2

[test003; test009]

test003:

1	`./mongod -shardsvr -replSet shard2 -port 27017 -dbpath /mongodb/data/shard21 -oplogSize 10000 -logpath /mongodb/logs/shard21.log -logappend -fork`

test009:

1	`./mongod -shardsvr -replSet shard2 -port 27017 -dbpath /mongodb/data/shard22 -oplogSize 10000 -logpath /mongodb/logs/shard22.log -logappend -fork`

初始化shard2

config={_id:'shard2',members:[

{_id:0,host:'test003:27017'},

{_id:1,host:'test009:27017'}]

}

rs.initiate(config);

Shard3

[test004; test010]

test004:

1	`./mongod -shardsvr -replSet shard3 -port 27017 -dbpath /mongodb/data/shard31 -oplogSize 10000 -logpath /mongodb/logs/shard31.log -logappend -fork`

test010:

1	`./mongod -shardsvr -replSet shard3 -port 27017 -dbpath /mongodb/data/shard32 -oplogSize 10000 -logpath /mongodb/logs/shard32.log -logappend -fork`

初始化shard3

config={_id:'shard3',members:[

{_id:0,host:'test004:27017'},

{_id:1,host:'test010:27017'}]

}

rs.initiate(config);

config server配置

[test005; test006; test007]

1	`./mongod -configsvr -dbpath /mongodb/data/config -port 20000 -logpath /mongodb/logs/config.log -logappend -fork`

Mongos配置

[test005; test006; test007]

1	`./mongos -configdb test005:20000,test006:20000,test007:20000 -port 30000 -chunkSize 5 -logpath /mongodb/logs/mongos.log -logappend -fork`

Route 转发请求到实际的目标服务进程，并将多个结果合并回传给客户端。Route 本身并不存储任何数据和状态，仅在启动时从 Config Server 获取信息。Config Server 上的任何变动都会传递给所有的 Route Process。

Configuring the Shard Cluster

1. 连接admin数据库

1	`./mongo test005:30000/admin`

2. 加入shards

db.runCommand({addshard:"shard1/test002:27017,test008:27017",name:"s1",maxsize:20480});

db.runCommand({addshard:"shard2/test003:27017,test009:27017",name:"s2",maxsize:20480});

db.runCommand({addshard:"shard3/test004:27017,test010:27017",name:"s3",maxsize:20480});

3. Listing shards

1	`db.runCommand({listshards:1})`

如果列出了以上3个shards，表示shards已经配置成功

4. 激活数据库和表分片

1 2	`db.runCommand({enablesharding:"taobao"});` `db.runCommand({shardcollection:"taobao.test0",key:{_id:1}}); db.runCommand({shardcollection:"taobao.test1",key:{_id:1}});`

使用

shell操作数据库

超级用户相关：

1) 进入数据库admin

1	`use admin`

2) 增加或修改用户密码

1	`db.addUser('name','pwd')`

3) 查看用户列表

1	`db.system.users.find()`

4) 用户认证

1	`db.auth('name','pwd')`

5) 删除用户

1	`db.removeUser('name')`

6) 查看所有用户

1	`show users`

7) 查看所有数据库

1	`show dbs`

8) 查看所有的collection

1	`show collections`

9) 查看各collection的状态

1	`db.printCollectionStats()`

10) 查看主从复制状态

1	`db.printReplicationInfo()`

11) 修复数据库

1	`db.repairDatabase()`

12) 设置记录profiling，0=off 1=slow 2=all

1	`db.setProfilingLevel(1)`

13) 查看profiling

1	`show profile`

14) 拷贝数据库

1	`db.copyDatabase('mail_addr','mail_addr_tmp')`

15) 删除collection

1	`db.mail_addr.drop()`

16) 删除当前的数据库

1	`db.dropDatabase()`

增加删除修改:

1) Insert

db.user.insert({'name':'dump','age':1})

or

db.user.save({'name':'dump','age':1})

嵌套对象:

1	`db.foo.save({'name':'dump','address':{'city':'hangzhou','post':310015},'phone':[138888888,13999999999]})`

数组对象:

1	`db.user_addr.save({'Uid':'dump','Al':['test-1@taobao.com','test-2@taobao.com']})`

2) delete

删除name=’dump’的用户信息:

1	`db.user.remove({'name':'dump'})`

删除foo表所有信息:

1	`db.foo.remove()`

3) update

//update foo set xx=4 where yy=6

//如果不存在则插入，允许修改多条记录

1	`db.foo.update({'yy':6},{'$set':{'xx':4}},upsert=true,multi=true)`

查询:

coll.find() // select * from coll

coll.find().limit(10) // select * from coll limit 10

coll.find().sort({x:1}) // select * from coll order by x asc

coll.find().sort({x:1}).skip(5).limit(10) // select * from coll order by x asc limit 5, 10

coll.find({x:10}) // select * from coll where x = 10

coll.find({x: {$lt:10}}) // select * from coll where x <= 10

coll.find({}, {y:true}) // select y from coll

coll.count() //select count(*) from coll

其他:

coll.find({"address.city":"gz"}) // 搜索嵌套文档address中city值为gz的记录

coll.find({likes:"math"}) // 搜索数组

coll.find({name: {$exists: true}}); //查询所有存在name字段的记录

coll.find({phone: {$exists: false}}); //查询所有不存在phone字段的记录

coll.find({name: {$type: 2}}); //查询所有name字段是字符类型的coll.find({age: {$type: 16}}); //查询所有age字段是整型的

索引:

1(ascending),-1(descending)

coll.ensureIndex({productid:1}) // 在productid上建立普通索引

coll.ensureIndex({district:1, plate:1}) // 多字段索引

coll.ensureIndex({"address.city":1}) // 在嵌套文档的字段上建索引

coll.ensureIndex({productid:1}, {unique:true}) // 唯一索引

coll.ensureIndex({productid:1}, {unique:true, dropDups:true|) // 建索引时，如果遇到索引字段值已经出现过的情况，则删除重复记录

coll.getIndexes() // 查看索引

coll.dropIndex({productid:1}) // 删除单个索引

MongoDB Drivers

C++

Haskell

Java

Javascript

Perl

PHP

Python

Ruby

Scala (via Casbah)

Mongodb支持的client 编程api非常多，由于dump中心是建立在hadoop的基础上的，所以着重介绍java api,后面的测试程序采用的也是java api.

MongoDB in Java

下载MongoDB的Java驱动，把jar包(mongo-2.3.jar)扔到项目里去就行了，

Java中，Mongo对象是线程安全的，一个应用中应该只使用一个Mongo对象。Mongo对象会自动维护一个连接池，默认连接数为10。

import com.mongodb.*

try{

Mongo mg = new Mongo(server_lists);// List server _lists

DB db = mg.getDB("taobao");

if (db.isAuthenticated() == false) {

db.authenticate("name", "pwd".toCharArray());

}

DBCollection coll=db.getCollection("category_property_values");

coll.slaveOk();//repl set模式必须调用，否则所有query将只发到主节点查询

//insert

BasicDBObject doc = new BasicDBObject();

//赋值

doc.put("name", "MongoDB");

doc.put("type", "database");

coll.insert(doc);

……

//select

//查询一条数据

BasicDBObject doc = new BasicDBObject();

doc.put("name", "MongoDB");

DBObject query = coll.findOne(doc);

……

//使用游标查询

DBCursor cur = coll.find(doc);

while(cur.hasNext()) {

cur.next();

……

}

……

//update

DBObject dblist = new BasicDBObject();

DBObject qlist = new BasicDBObject();

qlist.put("_id", j);

dblist.put("t1", str);

coll.update(qlist, dblist);

……

//delete

DBObject dlist = new BasicDBObject();

dlist.put("_id", j);

coll.remove(dlist);

}catch(MongoException ex){

}

MongoDB 测试

测试版本: 1.6.3

采用单线程分别插入100万，300万,500万,1000万数据和多个线程，每线程插入100万数据.

插入数据格式:

1	`{ "_id" : NumberLong(16), "nid" : NumberLong(16), "t1" : "search_engine_insert", "t2" : "search_engine_insert", "t3" : "search_engine_insert", "t4" : "search_engine_insert" }`

1) Master slaves模式

Insert

Per-thread rows	run time	Per-thread insert	Total-insert	Total rows	threads
1000000	20	50000	50000	1000000	1
3000000	60	50000	50000	3000000	1
5000000	99	50505	50505	5000000	1
8000000	159	50314	50314	8000000	1
10000000	208	48076	48076	10000000	1
1000000		15625	31250	2000000	2

Mongodb只有主节点才能进行插入和更新操作.

Update

数据格式:

1	`{ "_id" : NumberLong(16), "nid" : NumberLong(16), "t1" : "search_engine_update", "t2" : "search_engine_update", "t3" : "search_engine_update", "t4" : "search_engine_update" }`

Per-thread rows	run time	Per-thread update	Total-update	Total rows	threads
1000000	96	10416	10416	1000000	1
3000000	287	10452	10452	3000000	1
1000000	188	5319	15957	3000000	3
1000000	351	2849	14245	5000000	5

Select

以”_id”字段为key，返回整条记录

a) 客户端:单机多线程

Per-thread rows	run time	Per-thread select	Total-select	Total rows	threads
1000000	72	13888	13888	1000000	1
1000000	129	7751	77519	10000000	10
1000000	554	1805	90252	50000000	50
1000000	1121	2	206	100000000	100
1000000	2256	443	88652	200000000	200

b) 客户端:分布式多线程

程序部署在39台机器上

Per-thread rows	run time	Per-thread select	Total-select	Total rows	threads
1000000	173	5780	5780*39=223470	1000000*39	1
1000000	1402	713	7132*39=278148	10000000*39	10
500000	1406	355	7112*39=277368	10000000*39	20
200000	1433	139	6978*39=272142	10000000*39	50

2) Replica Set 模式

Insert

Per-thread rows	run time	Per-thread insert	Total-insert	Total rows	threads
1000000	40	25000	25000	1000000	1
3000000	117	251	251	3000000	1
5000000	211	23696	23696	5000000	1
8000000	2	27681	27681	8000000	1
10000000	388	25773	25773	10000000	1
1000000	83	12048	24096	2000000	2
1000000	210	4762	23809	5000000	5

Update

Per-thread rows	run time	Per-thread update	Total-update	Total rows	threads
1000000	28	35714	35714	1000000	1
3000000	83	36144	36144	3000000	1
1000000	146	6849	20547	3000000	3
1000000	262	3816	19083	5000000	5

Select

以”_id”字段为key，返回整条记录

a) 客户端:单机多线程

Per-thread rows	run time	Per-thread select	Total-select	Total rows	threads
1000000	198	5050	5050	1000000	1
1000000	2	3787	37878	10000000	10
1000000	436	2293	114678	50000000	50
1000000	754	1326	132625	100000000	100
1000000	1526	655	131061	200000000	200

b) 客户端:分布式多线程

程序部署在39台机器上

Per-thread rows	run time	Per-thread select	Total-select	Total rows	threads
1000000	216	4629	4629*39=180531	1000000*39	1
1000000	1375	729	7293*39=284427	10000000*39	10
500000	1469	340	6807*39=265473	10000000*39	20
200000	1561	128	06*39=249834	10000000*39	50

3) Sharding 模式

Insert

Per-thread rows	run time	Per-thread insert	Total-insert	Total rows	threads
1000000	58	17241	17241	1000000	1
3000000	180	16666	16666	3000000	1
5000000	373	13404	13404	5000000	1
2000000	234	8547	17094	4000000	2
2000000	447	4474	22371	10000000	5

Update

Per-thread rows	run time	Per-thread update	Total-update	Total rows	threads
1000000	38	26315	26315	1000000	1
3000000	115	26086	26086	3000000	1
1000000		15625	46875	3000000	3
1000000	93	10752	53763	5000000	5

Select

以”_id”字段为key，返回整条记录

a) 客户端:单机多线程

Per-thread rows	run time	Per-thread select	Total-select	Total rows	threads
1000000	277	3610	3610	1000000	1
1000000	456	2192	21929	10000000	10
1000000	1158	863	43177	50000000	50
1000000	2299	434	43497	100000000	100

b) 客户端:分布式多线程

程序部署在39台机器上

Per-thread rows	run time	Per-thread select	Total-select	Total rows	threads
1000000	659	1517	1517*39= 59163	1000000*39	1
1000000	8540	117	1170*39=45630	10000000*39	10

小结:

Mongodb在M-S和Repl-Set模式下查询效率还是不错的，区别在于Repl-Set模式如果有primary节点挂掉，系统自己会选举出另一个primary节点，不会影响后续的使用，原来的主节点恢复后自动成为secondary节点,而M-S模式一旦master 节点挂掉需要手工将别的slaves 节点修改成master,另外Repl-Set模式最多只能有7个节点.

由于sharding模式查询速度下降明显，耗时太长,所以只测试了2轮,估计他的威力应该在数据量非常大的环境下才能体现出来吧,以上数据仅供参考，现在只是简单的进行了测试，接下来会对源码进行一下研究，欢迎和感兴趣的同学多多交流！

下载本文

显示全文

全部频道

MongoDB概述

配置

Master-slaves 模式

slaveOk

Replica Set模式

Sharding模式

Shard配置

config server配置

Mongos配置

Configuring the Shard Cluster

使用

shell操作数据库

MongoDB Drivers

MongoDB in Java

MongoDB 测试