大数据动态之201508

Cloudera:
Cloudera Navigator路线图
http://blog.cloudera.com/blog/2015/08/whats-next-for-apache-hadoop-data-management-and-governance-cloudera-navigator-roadmap/
NoSQL性能测试开放标准套件YCSB加入Cloudera实验室项目中
http://blog.cloudera.com/blog/2015/08/ycsb-the-open-standard-for-nosql-benchmarking-joins-cloudera-labs/
Spark在TripAdvisor的机器学习应用案例
http://blog.cloudera.com/blog/2015/08/using-apache-spark-for-massively-parallel-nlp-at-tripadvisor/
CDH支持Mesos
http://blog.cloudera.com/blog/2015/08/how-to-run-apache-mesos-on-cdh/
HBase开始支持HBase-Spark模块
http://blog.cloudera.com/blog/2015/08/apache-spark-comes-to-apache-hbase-with-hbase-spark-module/
Navigator Encrypt开始支持YARN Container安全
http://blog.cloudera.com/blog/2015/08/how-to-secure-yarn-containers-with-cloudera-navigator-encrypt/
基于Kafka和HBase的近实时集成架构案例: Santanders
http://blog.cloudera.com/blog/2015/08/inside-santanders-near-real-time-data-ingest-architecture/

Hortonworks:
Microsoft Azure Gallery开始支持HDP 2.3
http://hortonworks.com/blog/hortonworks-sandbox-with-hdp-2-3-is-now-available-on-microsoft-azure-gallery/
Microsoft Azure支持Spark
http://hortonworks.com/blog/microsoft-and-hortonworks-do-spark-in-the-cloud/
Storm的容错Nimbus架构
http://hortonworks.com/blog/fault-tolerant-nimbus-in-apache-storm/

MapR
Spark Streaming with HBase
https://www.mapr.com/blog/spark-streaming-hbase
Apache Drill Architecture: The Ultimate Guide
https://www.mapr.com/blog/apache-drill-architecture-ultimate-guide
HBase架构深度剖析
https://www.mapr.com/blog/in-depth-look-hbase-architecture
HBase Schema设计指导
https://www.mapr.com/blog/guidelines-hbase-schema-design
如何利用Spark进行机器学习的并行与交互处理
https://www.mapr.com/blog/parallel-and-iterative-processing-machine-learning-recommendations-spark

Databricks
Spark 1.5发布,包含Tungsten,其利用代码生成技术和Cache感知算法,大幅度提升运行时的性能:
https://databricks.com/blog/2015/08/18/spark-1-5-preview-now-available-in-databricks.html
https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html

mongoDB
mongoDB 2.x版本发布了2个,3.x发布了3个:
http://blog.mongodb.org/post/128063809158/mongodb-306-rc2-is-released
http://blog.mongodb.org/post/127802855483/mongodb-317-is-released
http://blog.mongodb.org/post/126436298628/mongodb-2611-is-released
http://blog.mongodb.org/post/126436227873/mongodb-306-rc0-is-released
http://blog.mongodb.org/post/125850939688/mongodb-2611-rc0-is-released

Redis

参考:
NoSQL大数据分类
http://www.nosql-database.org/
Autodesk基于Mesos的通用事件系统架构
http://www.csdn.net/article/2015-08-27/2825550
QingCloud推出Spark即服务
http://mt.sohu.com/20150826/n419752360.shtml
Spark大数据分析框架的核心部件
http://my.oschina.net/u/2306127/blog/489024?p=1
Hadoop和大数据:60款顶级开源工具
http://os.51cto.com/art/201508/487936.htm
【微信分享】QingCloud周小四:Spark学习简谈
http://www.csdn.net/article/2015-08-07/2825404
【微信分享】李滔:搜狐基于Spark的新闻和广告推荐实战
http://www.csdn.net/article/2015-07-31/2825353
【微信分享】王团结:七牛是如何搞定每天500亿条日志的
http://www.csdn.net/article/2015-07-30/2825342
对七牛云存储日志处理的思考
http://hadoop1989.com/2015/08/02/Think-QiNiu-Cloud/
STORM在线业务实践-集群空闲CPU飙高问题排查
http://daiwa.ninja/index.php/2015/07/18/storm-cpu-overload/
Spark与Flink:对比与分析
http://www.csdn.net/article/2015-07-16/2825232
一共81个,开源大数据处理工具汇总(上)
http://www.36dsj.com/archives/24852
一共81个,开源大数据处理工具汇总(下)
http://home.hylanda.com/show_26_11558.html

总结:

1. Cloudera和Hortonworks都开始注重数据管理和数据治理,Cloudera是通过增强Cloudera Navigator来实现,Hortonworks通过引入Informatic组件Fabric来实现。
2. Spark 1.5发布;
3. HBase、Cassandra是Column Families/Wide Column Store;
4. MongoDB是Document Store;
5. Redis是Key Value/Tuple Store;
6. Neo4J是Graph Databases;