BigDataCamp LA 2013

Vote or submit your topic here

You've used all your votes and won't be able to post a new idea, but you can still search and comment on existing ideas.

There are two ways to get more votes:

  • When an admin closes an idea you've voted on, you'll get your votes back from that idea.
  • You can remove your votes from an open idea you support.
  • To see ideas you have already voted on, select the "My feedback" filter and select "My open ideas".
(thinking…)

Enter your idea and we'll search to see if someone has already suggested it.

If a similar idea already exists, you can support and comment on it.

If it doesn't exist, you can post your idea so others can support it.

Enter your idea and we'll search to see if someone has already suggested it.

  1. Beyond the Batch: Real Time Analytics - John Carnahan of Ticketmaster

    One definition of big in Big Data is data sets that have too much mass or are too complex to make decisions without doing some meaningful reduction. Approaches to dimensionality reduction go by many names including machine learning, pattern recognition or even AI. A more recent definition of big is data sets that are large enough to make batch processing of data not sufficient. One reason why batch processing would be insufficient may be that it is simply not cost-effective to store all data. This becomes a real concern with the explosion of incoming sensor data from personal devices. Another…

    95 votes
    Vote
    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      I agree to the terms of service
      Signed in as (Sign out)
      You have left! (?) (thinking…)
    • Hadoop Security - Vivek Shrivastava of Wipro Technologies

      Hadoop Security is a hot topic for financial and PII data. This talk will provide
      1. An overview of security in Hadoop
      2. Authentication,Authorization,Audit,Encryption
      3. Kerberos security
      3. Some of the ways to achieve column level security

      I am an Architect in Analytical and Information Management Practice ( Bigdata) at Wipro Technologies. In this role I am responsible for Wipro's Banking and Financial sector's client and currently enabling a major bank to leverage power of Hadoop eco system .

      92 votes
      Vote
      Sign in
      Check!
      (thinking…)
      Reset
      or sign in with
      • facebook
      • google
        Password icon
        I agree to the terms of service
        Signed in as (Sign out)
        You have left! (?) (thinking…)
      • Design Patterns for Big Data Architecture: Best Strategies for Streamlined [Simple, Powerful] Design - Allen Day of MapR

        The concerns of large scale distributed computing now go far beyond storage solutions to use a wide range of big data analytics, machine learning and interactive applications. The scale of projects is huge, the components vary from real-time to interactive to batch solutions, and the architecture may become very complex to accommodate these needs. How do you make the best choices to keep architectural design for these projects simple yet powerful? This presentation describes new innovations for key big data architecture design patterns, from the technical details to real world use cases. Wouldn’t you like to be able to stream…

        85 votes
        Vote
        Sign in
        Check!
        (thinking…)
        Reset
        or sign in with
        • facebook
        • google
          Password icon
          I agree to the terms of service
          Signed in as (Sign out)
          You have left! (?) (thinking…)
        • Inside Look at Shopzilla's Big Data Pipeline - Srinivasan Ponnuswamy & Subash D'Souza of Shopzilla

          Inside Look at Shopzilla's Big Data Pipeline - Shopzilla's Big Data pipeline built on top of hadoop utilizes a mix of Pig/java for mapreduce and spring batch for workflow management.
          Fasttrack, as it is called, was built to replace the legacy system - ditrack, built using perl. The legacy system had a no. of issues, core among them, its ability to scale at high levels of data.
          Fasttrack has proven to be much more agile and nimble in terms of raw processing power. It has brought down our reporting times from 3-4 hrs down to about 15 mins.

          In this…

          77 votes
          Vote
          Sign in
          Check!
          (thinking…)
          Reset
          or sign in with
          • facebook
          • google
            Password icon
            I agree to the terms of service
            Signed in as (Sign out)
            You have left! (?) (thinking…)
          • Exploring Enron Email Dataset with Kiji and Hive - Lee Sheng of WibiData

            Apache Hive is a data warehousing system for large volumes of data stored in Hadoop that provides SQL based access for exploring datasets. KijiSchema provides evolvable schemas of primitive and compound types on top of HBase. The integration between these provides the best aspects of both worlds (ad hoc SQL based querying on top of datasets using evolvable schemas containing complex objects). This talk will present an examples of queries utilizing this integration to do exploratory analysis of the Enron email corpus. Delving into topics such as email responder pairs and sentiment analysis can expose many of the interesting points…

            76 votes
            Vote
            Sign in
            Check!
            (thinking…)
            Reset
            or sign in with
            • facebook
            • google
              Password icon
              I agree to the terms of service
              Signed in as (Sign out)
              You have left! (?) (thinking…)
            • MapReduce Improvements in the MapR Hadoop Distribution - Adam Bordelon of MapR

              MapR is the only Hadoop distribution that provides full data protection, no single points of failure, improved performance, and dramatic ease of use advantages. Adam will dive into the features that provide improved performance and reliability over Hadoop. Basic outline:
              * Hadoop/MapReduce background
              * MapRFS vs. HDFS: FileSystem HA, Direct Access NFS, Volumes, Snapshots, Mirroring
              * MapReduce improvements: DirectShuffle, JT HA, ExpressLane
              * Beyond MapReduce: M7, YARN, etc.

              Senior Software Engineer, MapR Technologies
              Adam is the lead developer on MapR's Hadoop/MapReduce team, with responsibilities including MapReduce optimizations and YARN integration. Prior to working at MapR, Adam was at Amazon developing…

              68 votes
              Vote
              Sign in
              Check!
              (thinking…)
              Reset
              or sign in with
              • facebook
              • google
                Password icon
                I agree to the terms of service
                Signed in as (Sign out)
                You have left! (?) (thinking…)
              • What the #### is Accumulo? - Eric Newton of SW Complete, Inc

                I will answer questions about Apache Accumulo, the BigTable clone originally created by the National Security Agency. I can go over some high-level history, which will emphasize technical hurdles, and talk in great detail about experiences with scaling up the architecture and software maturity. This will be an ad-hoc discussion, and not a prepared presentation. I won't be able to talk about the NSA, except to complain about the parking and the cafeteria.

                Eric Newton supports small teams using Accumulo, and has been a contributor since 2009. He has modified about 40% of the code.
                Accumulo has been used on…

                63 votes
                Vote
                Sign in
                Check!
                (thinking…)
                Reset
                or sign in with
                • facebook
                • google
                  Password icon
                  I agree to the terms of service
                  Signed in as (Sign out)
                  You have left! (?) (thinking…)
                • Real-Time Streaming Analytics on HBase at Kelley Blue Book - Richard Larson of Kelley Blue Book

                  How Kelley Blue Book implemented a streaming analytics platform on top of HBase and HDFS in order to support real-time personalization for the consumers of its website.

                  This quick 30 minute talk will cover the high-level key elements of what we did, why we did it, what made this effort a success and what to look out for when implementing similar platforms.

                  Kelley Blue Book is currently utilizing Hadoop and related "Big Data" technologies to implement platforms for real-time streaming analytics and data modeling. We are implementing a multi-layered architecture that embraces the concept of transformation, computation and aggregation on-the-fly…

                  43 votes
                  Vote
                  Sign in
                  Check!
                  (thinking…)
                  Reset
                  or sign in with
                  • facebook
                  • google
                    Password icon
                    I agree to the terms of service
                    Signed in as (Sign out)
                    You have left! (?) (thinking…)
                  • Teradata’s Big Data Portfolio - Sudipta Burman of TeraData

                    Introduction to Teradata Unified Data Architecture,Teradata Aster Big Analytics,Teradata Hadoop ,SQL MR (Map Reduce), SQL H (Hadoop).

                    Sudipta Burman architects and implements Big Data ( Hadoop) and advance analytics solutions. He helps his sales team perform POC. He works with the engineering team for product enhancement and alliances.

                    41 votes
                    Vote
                    Sign in
                    Check!
                    (thinking…)
                    Reset
                    or sign in with
                    • facebook
                    • google
                      Password icon
                      I agree to the terms of service
                      Signed in as (Sign out)
                      You have left! (?) (thinking…)
                    • Hadoop 2 and YARN - Bikas Saha of Hortonworks

                      The Apache Software Foundation recently announced the GA release of Hadoop 2. This is a pivotal release for Hadoop including significant improvements to HDFS and a complete re-design of the Hadoop compute architecture with YARN. The talk will provide an overview of Hadoop 2 with a focus on YARN and the new opportunities in data processing that it opens up for the Big Data world.

                      Bikas Saha currently workson the Hadoop platform and other ecosystem components.

                      39 votes
                      Vote
                      Sign in
                      Check!
                      (thinking…)
                      Reset
                      or sign in with
                      • facebook
                      • google
                        Password icon
                        I agree to the terms of service
                        Signed in as (Sign out)
                        You have left! (?) (thinking…)
                      • Beaconspec @ Hulu - Prasan Samtani of Hulu

                        The metrics team at Hulu is responsible for processing and analyzing a very large volume of data (~ 1 terabyte/day). Our data analytics platform at Hulu is built on Hadoop, an open-source framework that provides a distributed filesystem (HDFS), and a programming model for distributed computing (MapReduce). Most users of Hadoop typically write these MapReduce jobs by hand in a programming language like Java or Python. However, the MapReduce programming model is unfamiliar to many conventional programmers, and, in many cases, creates a lot of redundant verbosity that clutters the actual essence of the computation to be performed.

                        The approach…

                        28 votes
                        Vote
                        Sign in
                        Check!
                        (thinking…)
                        Reset
                        or sign in with
                        • facebook
                        • google
                          Password icon
                          I agree to the terms of service
                          Signed in as (Sign out)
                          You have left! (?) (thinking…)
                        • Customer Lifetime Value and What Your Income Statement Isn’t Telling You - Andrew Calver of Gamefly

                          For any business that invests in customer acquisition, the return on that investment is hugely important. That ROI is found by calculating the lifetime value of customers verses the investment to acquire them. The lifetime value is measured in how much revenue the customer has brought into the business through reoccurring subscription fees or the purchasing of goods verses the costs to acquire and keep them. Visibility into those revenues and costs is difficult with analysis via a traditional income statement because revenue is layered in by many different acquisition investments that occurred in the past.
                          The Gamefly team will…

                          26 votes
                          Vote
                          Sign in
                          Check!
                          (thinking…)
                          Reset
                          or sign in with
                          • facebook
                          • google
                            Password icon
                            I agree to the terms of service
                            Signed in as (Sign out)
                            You have left! (?) (thinking…)
                          • Impala: A Modern, Open-Source SQL Engine for Hadoop - Mark Grover of Cloudera

                            This is a technical deep dive about Cloudera Impala, the project that makes scalable parallel database technology available to the Hadoop community for the first time. Impala is an open-sourced code base that allows users to issue low-latency queries to data stored in HDFS and Apache HBase using familiar SQL operators.

                            I am currently a software engineer at Cloudera, contributing to the design and development of various open source technologies in the Hadoop ecosystem. Before Cloudera, I have a user of such technologies where I built a large scale distributed warehouse on Hadoop using Hive.

                            25 votes
                            Vote
                            Sign in
                            Check!
                            (thinking…)
                            Reset
                            or sign in with
                            • facebook
                            • google
                              Password icon
                              I agree to the terms of service
                              Signed in as (Sign out)
                              You have left! (?) (thinking…)
                            • Extending Your Data Infrastructure with Hadoop - Mark Grover of Cloudera

                              Hadoop provides significant value when integrated with an existing data infrastructure, but even among Hadoop experts there's still confusion about options for data integration and business intelligence with Hadoop. This class will help clear up the confusion.

                              You will learn:
                              • How can I use Hadoop to complement and extend my data infrastructure?
                              • How can Hadoop complement my data warehouse?
                              • What are the capabilities and limitations of available tools?
                              • How do I get data into and out of Hadoop?
                              • How can I use my existing data integration and business intelligence tools with Hadoop?
                              • How can…

                              22 votes
                              Vote
                              Sign in
                              Check!
                              (thinking…)
                              Reset
                              or sign in with
                              • facebook
                              • google
                                Password icon
                                I agree to the terms of service
                                Signed in as (Sign out)
                                You have left! (?) (thinking…)
                              • Hadoop Principals - Reza Madani of HarborObjects

                                The intent of this session is to provide SQL Server developers an overview of Hadoop key concepts. This presentation will also cover categories of big data that are most suitable for a Hadoop implementation. A few HDInsight sample batch queries for developers and Azure details will also be covered.

                                I have worked with two distinct industries for big data implementation: (1) pharmaceutical (large amounts of device data) where relational DBs can't handle the volume; (2) Web based Marketing (dynamic nature of incoming data) where relational DBs are more static in nature.

                                22 votes
                                Vote
                                Sign in
                                Check!
                                (thinking…)
                                Reset
                                or sign in with
                                • facebook
                                • google
                                  Password icon
                                  I agree to the terms of service
                                  Signed in as (Sign out)
                                  You have left! (?) (thinking…)
                                • Big Data on AWS - Lynn Langit of Lynn Langit Consulting

                                  Understanding AWS services that relate to data - from S3/Glacier, DynamoDB to specialized EC2 instances (RDS, RedShift, MapReduce). Also Data Pipelines.

                                  Lynn Langit is a Big Data architect. She has written (and helped implement) projects on Azure, AWS, Google and Rackspace. Databases include SQL Server, MongoDB, Neo4j and Hadoop

                                  21 votes
                                  Vote
                                  Sign in
                                  Check!
                                  (thinking…)
                                  Reset
                                  or sign in with
                                  • facebook
                                  • google
                                    Password icon
                                    I agree to the terms of service
                                    Signed in as (Sign out)
                                    You have left! (?) (thinking…)
                                  • Intro to R Workshop - Ray DiGiacomo, Jr.of Lion Data Systems

                                    This workshop would allow those interested in learning R to get a good first exposure to the technology. The workshop could include a PowerPoint Lecture and/or a hands-on lab where all attendees would run R on their laptops and follow along with me while I code.

                                    I am the President of the Orange County Hadoop User Group. We are now 328 members strong. Visit us at http://www.meetup.com/OC-HUG

                                    My personal focus on Big Data is around Healthcare Predictive Analytics.

                                    20 votes
                                    Vote
                                    Sign in
                                    Check!
                                    (thinking…)
                                    Reset
                                    or sign in with
                                    • facebook
                                    • google
                                      Password icon
                                      I agree to the terms of service
                                      Signed in as (Sign out)
                                      You have left! (?) (thinking…)
                                    • Developer's Guide to Coprocessors - John Weatherford of Telescope

                                      This talk will cover all the specifics needed for a Java developer to start creating coprocessors. It will start with a brief introduction to what a coprocessor is and why it is useful, describing the difference between observers and endpoints and showing examples of implementations. An outline of the class structure will be given describing the coprocessor interface and example code will be shown for a simple data manipulation observer and data extraction endpoint. After the code is shown, it will discuss the steps needed to run the coprocessor on HBase as well as where to find log data and…

                                      17 votes
                                      Vote
                                      Sign in
                                      Check!
                                      (thinking…)
                                      Reset
                                      or sign in with
                                      • facebook
                                      • google
                                        Password icon
                                        I agree to the terms of service
                                        Signed in as (Sign out)
                                        You have left! (?) (thinking…)
                                      • REEF: Retainable Evaluator Execution Framework - Tyson Condie of Microsoft

                                        With YARN, resource management has been decoupled from the programming model (MapReduce, in this case) for the first time in the Hadoop ecosystem. This not only solves an important scalability bottleneck in Hadoop, it also opens the door for a wide range of programming frameworks beyond MapReduce on Hadoop. It is well understood that while enticingly simple and fault tolerant, the MapReduce model is not ideal for many applications, especially iterative or recursive workloads like machine learning and graph processing; in general, workloads that benefit from main memory (as opposed to disk based) computation. There wide range of Big Data…

                                        12 votes
                                        Vote
                                        Sign in
                                        Check!
                                        (thinking…)
                                        Reset
                                        or sign in with
                                        • facebook
                                        • google
                                          Password icon
                                          I agree to the terms of service
                                          Signed in as (Sign out)
                                          You have left! (?) (thinking…)
                                          1 comment  ·  Admin →
                                        • Teradata Portfolio for Hadoop - Peyman Mohajerian of Teradata

                                          Many large corporations with existing Data Warehousing solutions are incorporating Hadoop for many use case, e.g. Active Archival, Data Staging, analysis of unstructured data.
                                          There are several approaches to exchanging data between DW systems, e.g. Teradata and Hadoop. Sqoop is the tool for passing data back and forth offline, but in addition Teradata has a SQL-H, which is a SQL tool within Teradata that facilitates data pull from Hadoop via a SQL Select statement.

                                          As part of the Big Data team in Teradata, we are helping our clients in adopting Hadoop as part of the Data Warehousing solution. We mainly…

                                          7 votes
                                          Vote
                                          Sign in
                                          Check!
                                          (thinking…)
                                          Reset
                                          or sign in with
                                          • facebook
                                          • google
                                            Password icon
                                            I agree to the terms of service
                                            Signed in as (Sign out)
                                            You have left! (?) (thinking…)
                                          ← Previous 1

                                          BigDataCamp LA 2013

                                          Feedback and Knowledge Base