大数据动态之201509

Apache Kylin
Apache Kylin v1.0 发布,分布式分析引擎
http://www.oschina.net/news/65938/apache-kylin-1-0-released

Apache Calcite
Apache Calcite:Hadoop中新型大数据查询引擎
http://www.infoq.com/cn/articles/new-big-data-hadoop-query-engine-apache-calcite

Apache Flink
http://flink.apache.org/

Cloudera
新的快数据存储Hadoop组件,Kudu:
http://blog.cloudera.com/blog/2015/09/kudu-new-apache-hadoop-storage-for-fast-analytics-on-fast-data/
Hadoop细粒度的安全增强组件,RecordService:
http://blog.cloudera.com/blog/2015/09/recordservice-for-fine-grained-security-enforcement-across-the-hadoop-ecosystem/
HDFS Erasure Coding特性
http://blog.cloudera.com/blog/2015/09/introduction-to-hdfs-erasure-coding-in-apache-hadoop/
Spark测试基础库
http://blog.cloudera.com/blog/2015/09/making-apache-spark-testing-easy-with-spark-testing-base/
如何使用Impala对非结构化数据进行分析:
http://blog.cloudera.com/blog/2015/09/how-to-prepare-unstructured-data-in-impala-for-analysis/
BI场景下Impala测试结果
http://blog.cloudera.com/blog/2015/09/how-impala-scales-for-business-intelligence-new-test-results/
揭秘Apache Hadoop YARN:
http://blog.cloudera.com/blog/2015/09/untangling-apache-hadoop-yarn-part-1/
http://blog.cloudera.com/blog/2013/11/migrating-to-mapreduce-2-on-yarn-for-users/
http://blog.cloudera.com/blog/2013/11/migrating-to-mapreduce-2-on-yarn-for-operators/
Impala支持shell执行的动态进度报告(Impala’s debug webpages (http:::25000)):
http://blog.cloudera.com/blog/2015/09/dynamic-progress-reports-in-the-impala-shell/
Cloudera One Platform:
http://vision.cloudera.com/one-platform/

Hortonworks
Microsoft Azure HDInsight对Ubuntu Linux支持,可以支持到Hadoop 2.6:
http://hortonworks.com/blog/microsoft-azure-hdinsight-on-linux-choice/
HDP 2.3 Sandbox在Microsoft Azure Gallery上线:
http://hortonworks.com/blog/hortonworks-sandbox-with-hdp-2-3-is-now-available-on-microsoft-azure-gallery/
Impala与Hive性能对比
http://hortonworks.com/blog/impala-vs-hive-performance-benchmark/
HDP迁移案例:
http://hortonworks.com/blog/migration-to-hdp-as-easy-as-1-2-3-without-downtime-or-disruption/

MapR
Spark on YARN资源分配配置
https://www.mapr.com/blog/resource-allocation-configuration-spark-yarn#.VfoRrSWqqko
https://spark.apache.org/docs/latest/running-on-yarn.html
https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
SAP HANA与Mapr DP混合架构
https://www.mapr.com/blog/sap-hana-vora-and-mapr-data-platform#.VfoRsCWqqko
Spark Streaming with HBase
https://www.mapr.com/blog/spark-streaming-hbase#.VfoR0SWqqko
MapR对Docker的支持
https://www.mapr.com/blog/how-create-instant-mapr-clusters-docker#.VfoR2CWqqko
https://www.mapr.com/blog/my-experience-running-docker-containers-on-mesos#.Vfoc_yWqqkp

Databricks
Spark Survey 2015调查结果:
https://databricks.com/blog/2015/09/24/spark-survey-results-2015-are-now-available.html
Spark代码调试:实时进度条和Spark UI
https://databricks.com/blog/2015/09/23/easier-spark-code-debugging-real-time-progress-bar-and-spark-ui-integration-in-databricks.html
新版本Spark 1.5上LDA算法的性能提升:
https://databricks.com/blog/2015/09/22/large-scale-topic-modeling-improvements-to-lda-on-spark.html
Spark 1.5 DataFrame API:
https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html
Spark 1.5发布,在性能、可用性、运维、Data Science API等方面有重大改进:
https://databricks.com/blog/2015/09/09/announcing-spark-1-5.html
http://www.csdn.net/article/2015-09-29/2825825
http://www.csdn.net/article/2015-09-10/2825669

MongoDB
MongoDB性能优化五个简单步骤:
http://www.csdn.net/article/2015-09-30/2825833
MongoDB开发版本3.1.8发布
http://www.csdn.net/article/2015-09-17/2825734
分布式文档数据库MongoDB开发版本3.1.7发布
http://www.csdn.net/article/2015-09-01/2825599-mongodb-317-is-released

参考
逆水行舟,看前行中的Spark
http://www.csdn.net/article/2015-09-21/2825754
微店的大数据平台建设实践与探讨
http://www.csdn.net/article/2015-09-21/2825756
打造数据驱动的组织:第二年
http://zhuanlan.zhihu.com/donglaoshi/20205116
揭开 Growth Hacking 的神秘面纱(上篇)
http://zhuanlan.zhihu.com/qinchao/20190015
京东大数据基础架构和实践—王彦明
http://share.csdn.net/slides/9138
京东数据仓库海量数据交换工具—张侃
http://share.csdn.net/slides/9137
京东大数据分析与创新应用
http://share.csdn.net/slides/9139
LinkedIn架构这十年
http://engineering.linkedin.com/architecture/brief-history-scaling-linkedin
http://colobu.com/2015/07/24/brief-history-scaling-linkedin/
LinkedIn是如何优化Kafka的
http://www.infoq.com/cn/articles/linkedIn-improving-kafka
http://engineering.linkedin.com/apache-kafka/how-we%E2%80%99re-improving-and-advancing-kafka-linkedin
阿里CDN从自建到服务
http://share.csdn.net/slides/8319
系统架构设计-负载均衡和高可用
http://share.csdn.net/slides/12338
OSTC2015-朱照远(叔度)阿里开源经验分享
http://share.csdn.net/slides/13730
Voidbox
http://dongxicheng.org/mapreduce-nextgen/voidbox-docker-on-hadoop-hulu/
深入理解Spark Streaming执行模型
http://www.csdn.net/article/2015-09-13/2825689
Apache Spark 1.5新特性介绍
http://www.csdn.net/article/2015-09-10/2825669
盘点大数据生态圈,那些繁花似锦的开源项目
http://www.csdn.net/article/2015-09-11/2825674
Redis整合Spring项目搭建实例
http://www.csdn.net/article/2015-09-01/2825600
MongoDB开发版本3.1.8发布
http://www.csdn.net/article/2015-09-17/2825734
分布式并行数据库将在 OLTP 领域促进去“Oracle”
http://www.csdn.net/article/2015-09-11/2825678
Gartner 2015新兴技术发展周期简评:大数据实用化、机器学习崛起
http://www.csdn.net/article/2015-09-06/2825620
Hortonworks收购Onyara,启动数据流自动化
http://www.csdn.net/article/2015-09-02/2825612